UNet

Last Version: 12/09/2025

UNet Overview

This guide demonstrates efficient image segmentation using the lightweight UNet semantic segmentation model with the SpacemiT AI processor.

The model accepts static images or video streams as input and outputs pixel-level semantic labels (grayscale) and pseudo-color renderings, which are published via ROS 2 for downstream perception or control modules.

The model is trained on the Cityscapes dataset and deployed in ONNX format, supporting accurate segmentation of typical urban scene elements such as people, vehicles, roads, and traffic signs.

Typical applications for UNet include:

Autonomous driving perception (road/target segmentation)
Urban street scene analysis and digital map construction
Remote sensing land-use and object classification
Medical image segmentation (e.g., organs or lesions)
Industrial defect detection and preprocessing

Environment Setup

Install Dependencies

sudo apt install python3-opencv ros-humble-cv-bridge ros-humble-camera-info-manager \
ros-humble-image-transport python3-spacemit-ort

Load ROS 2 Environment

source /opt/bros/humble/setup.bash

Check Supported Segmentation Models

Run the following command to view available segmentation model configurations:

ros2 launch br_perception infer_info.launch.py | grep 'segmentation'

Example output:

- config/segmentation/unet.yaml

Image Inference

Prepare Input Image

cp /opt/bros/humble/share/jobot_infer_py/data/segmentation/test_unet.jpg .

Run Inference and Save Results

ros2 launch br_perception infer_img.launch.py \
  config_path:='config/segmentation/unet.yaml' \
  img_path:='./test_unet.jpg'

Example terminal output:

[INFO] [launch]: All log files can be found below /home/zq-pi/.ros/log/2025-05-26-09-45-18-005008-spacemit-k1-x-MUSE-Pi-board-5995
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [infer_img_node-1]: process started with pid [5996]
[infer_img_node-1] Inference time: 4665.87 ms
[infer_img_node-1] The semantic segmentation results are saved in: seg_result.jpg
[infer_img_node-1] The semantic segmentation pseudo-color image is saved to seg_pseudo_color.png
[INFO] [infer_img_node-1]: process has finished cleanly [pid 5996]

Output files:

seg_result.jpg: Overlay of input image with predicted segmentation
seg_pseudo_color.png: Pseudo-color map with semantic classes in color

Example images:

Original Image	Segmentation Overlay	Pseudo-Color Map

Web Visualization of Inference

Terminal 1 — Launch inference with image publishing:

ros2 launch br_perception infer_img.launch.py \
  config_path:='config/segmentation/unet.yaml' \
  img_path:='./test_unet.jpg' \
  publish_result_img:=true \
  result_img_topic:='result_img' \
  result_topic:='/inference_result'

Example output:

[INFO] [launch]: All log files can be found below /home/zq-pi/.ros/log/2025-05-26-10-05-41-203368-spacemit-k1-x-MUSE-Pi-board-7077
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [infer_img_node-1]: process started with pid [7084]
[infer_img_node-1] Inference time: 4712.77 ms
[infer_img_node-1] The image inference results are published cyclically
[infer_img_node-1] The image inference results are published cyclically
[infer_img_node-1] The image inference results are published cyclically

Terminal 2 — Start WebSocket visualization:

ros2 launch br_visualization websocket_cpp.launch.py image_topic:='/result_img'

Visit the provided URL in your browser (http://<IP>:8080) to view the segmentation results.

Message Subscription and Monitoring

The topic result_topic:='/inference_result' publishes the inference results. You can view the data using:

ros2 topic echo /inference_result

Example output:

header:
  stamp:
    sec: 0
    nanosec: 0
  frame_id: ''
height: 480
width: 640
encoding: mono8
is_bigendian: 0
step: 640
data:
- 2
- 2
- 2
- 2
.....

The message uses the standard Image format. The encoding is mono8, representing a single-channel grayscale image, where each pixel value corresponds to the class index of the segmented object.

Press Ctrl + C to stop the following command:

ros2 launch br_perception infer_img.launch.py config_path:='config/segmentation/unet.yaml' img_path:='./test_unet.jpg' publish_result_img:=true result_img_topic:='result_img' result_topic:='/inference_result'

You can replace img_path with a different image to perform inference on other inputs. The results on the web interface will update automatically.

infer_img.launch.py Parameters

Parameter	Description	Default
`config_path`	Path to the model configuration file used for inference	`config/detection/yolov6.yaml`
`img_path`	Path to the image file to run inference on	`data/detection/test.jpg`
`publish_result_img`	Whether to publish the detection result as an image message	`false`
`result_img_topic`	Topic name for rendered image output (only valid if `publish_result_img = true`)	`/result_img`
`result_topic`	Topic name for the inference result message	`/inference_result`

Video Stream Inference

Start Camera Device

ros2 launch br_sensors usb_cam.launch.py video_device:="/dev/video20"

Start Video Segmentation

Terminal 1 — Launch video stream segmentation:

ros2 launch br_perception infer_video.launch.py \
  config_path:='config/segmentation/unet.yaml' \
  sub_image_topic:='/image_raw' \
  publish_result_img:=true \
  result_topic:='/inference_result'

Web Visualization

Terminal 2 — Start WebSocket visualization:

ros2 launch br_visualization websocket_cpp.launch.py image_topic:='/result_img'

Visit http://<IP>:8080 in your browser to view real-time segmentation.

Raw Inference Results (No Visualization)

To have the grayscale semantic labels without web visualization:

ros2 launch br_perception infer_video.launch.py \
  config_path:='config/segmentation/unet.yaml' \
  sub_image_topic:='/image_raw' \
  publish_result_img:=false \
  result_topic:='/inference_result'

You can view the data using:

ros2 topic echo /inference_result

Example output:

header:
  stamp:
    sec: 0
    nanosec: 0
  frame_id: ''
height: 480
width: 640
encoding: mono8
is_bigendian: 0
step: 640
data:
- 2
- 2
- 2
- 2
.....

The message uses the standard Image format. The encoding is mono8, representing a single-channel grayscale image, where each pixel value corresponds to the class index of the segmented object.

infer_video.launch.py Parameters

Parameter Name	Description	Default Value
`config_path`	Path to the model configuration file for inference	`config/detection/yolov6.yaml`
`sub_image_topic`	Image topic to subscribe to	`/image_raw`
`publish_result_img`	Whether to publish the rendered inference image	`false`
`result_img_topic`	Image topic to publish (only when `publish_result_img=true`)	`/result_img`
`result_topic`	Inference-result topic to publish	`/inference_result`

UNet

UNet Overview​

Environment Setup​

Install Dependencies​

Load ROS 2 Environment​

Check Supported Segmentation Models​

Image Inference​

Prepare Input Image​

Run Inference and Save Results​

Web Visualization of Inference​

Message Subscription and Monitoring​

Video Stream Inference​

Start Camera Device​

Start Video Segmentation​

Web Visualization​

Raw Inference Results (No Visualization)​

UNet Overview

Environment Setup

Install Dependencies

Load ROS 2 Environment

Check Supported Segmentation Models

Image Inference

Prepare Input Image

Run Inference and Save Results

Web Visualization of Inference

Message Subscription and Monitoring

Video Stream Inference

Start Camera Device

Start Video Segmentation

Web Visualization

Raw Inference Results (No Visualization)