6.5.2 Smart Retail with Robotic Arm

Last Version: 19/09/2025

Watch the Demo Video

Overview

This demo showcases the K1 platform controls an Elephant robotic arm for a smart retail scenario. It integrates:

Function Call LLM
YOLOv8 for product detection
OCR for barcode and payment recognition

The robotic arm accurately identifies products through object detection and uses OCR for barcode scanning and checkout, providing an efficient and intelligent shopping experience.

The system supports voice interaction, significantly enhancing automation and convenience in shopping, and demonstrates the future intelligent retail.

Hardware

1 × Development board with K1 SoC (with power supply)
1 × Elephant Robotics myCobot 280 robotic arm (with flange camera and vacuum pump)
1 × Elephant Aikits AI kit
1 × USB microphone
1 × USB sound card or speaker

Software Modules

This demo consists of the following core modules:

Smart Product Recognition Module
- Combines voice interaction and computer vision.
- Uses the Function Call large language model to interprets voice commands like "Pick up an orange".
- Uses YOLOv8 to detect the target product in the camera feed.
- Supports detection of multiple product categories and 2D image coordinates.
Robotic Arm Motion Control Module
- Controls the robotic arm’s joints, angles, and suction pump.
- Converts YOLOv8’s 2D image coordinates into 3D joint angles for precise movement.
Product Scanning Module
- Uses the flange camera to scan product barcodes and retrieve pricing information.
Payment Processing Module
- Uses the Paddle OCR engine to recognize payment vouchers.

Environment Setup

Follow these steps to set up the environment for the smart retail system.

Step 1: Download the Project Code

Clone the project repository:

git clone https://github.com/elephantrobotics/jobot-ai-elephant.git ~/

Step 2: Install System Dependencies

Update the system and install required libraries:

sudo apt update
sudo apt install -y \
    spacemit-ollama-toolkit \
    portaudio19-dev \
    python3-dev \
    libopenblas-dev \
    ffmpeg \
    python3-venv \
    python3-spacemit-ort \
    libceres-dev \
    libopencv-dev

Step 3: Install Python Dependencies

Create a Python virtual environment:
```
virtualenv ~/asr-env
```

Configure pip to use the Spacemit mirror source:

pip config set global.extra-index-url https://git.spacemit.com/api/v4/projects/33/packages/pypi/simple

Install project dependencies:

cd ~/jobot-ai-elephant
source ~/asr-env/bin/activate
pip install -r requirements.txt

Step 4: Download and Deploy LLMs to Ollama

Create a directory for model storage:
```
mkdir -p ~/models && cd ~/models
```

Download the model and configuration files

# Download the Qwen2.5 0.5B model
wget -c https://archive.spacemit.com/spacemit-ai/gguf/Qwen2.5-0.5B-Instruct-Q4_0.gguf --no-check-certificate
wget -c https://archive.spacemit.com/spacemit-ai/modelfile/qwen2.5:0.5b.modelfile --no-check-certificate

# Download the Qwen2.5 0.5B-FC model
wget -c https://archive.spacemit.com/spacemit-ai/gguf/qwen2.5-0.5b-f16-elephant-fc-Q4_0.gguf --no-check-certificate
wget -c https://archive.spacemit.com/spacemit-ai/modelfile/qwen2.5-0.5b-elephant-fc.modelfile --no-check-certificate

Deploy the models to Ollama:

ollama create qwen2.5:0.5b -f qwen2.5:0.5b.modelfile
ollama create qwen2.5-0.5b-elephant-fc -f qwen2.5-0.5b-elephant-fc.modelfile

Verify model deployment:
```
ollama list
```
Expected Output with both models listed, as shown below
Remove the model storage directory to save space:
```
rm -rf ~/models
```

Step 5: Set Audio Permissions

Run the following command to add the current user to the audio group, granting access to and management of audio devices:

sudo usermod -aG audio $USER

Audio Module Configuration Guide

Step 1: Detect and Configure Microphone

List available microphone devices:

arecord -l

Sample Output:

**** List of CAPTURE Hardware Devices ****
card 1: Camera [USB Camera], device 0: USB Audio [USB Audio]
    Subdevices: 1/1
    Subdevice #0: subdevice #0
card 2: Camera_1 [USB 2.0 Camera], device 0: USB Audio [USB Audio]
    Subdevices: 1/1
    Subdevice #0: subdevice #0
card 3: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]
    Subdevices: 1/1
    Subdevice #0: subdevice #0

Note: Avoid selecting devices labeled "Camera" as they are typically camera microphones.

In this example, choose card 3 as the main microphone.

Update the microphone configuration in ~/jobot-ai-elephant/smart_main_asr.py:

record_device = 3  # Recording device index, should be modified according to the actual detection results
rec_audio = RecAudioThreadPipeLine(
    vad_mode=1,
    sld=2,
    max_time=2,
    channels=1,
    rate=48000,
    device_index=record_device
)

Step 2: Detect and Configure the Speaker

List available speaker devices:

aplay -l

Example output:

card 0: sndes8326 [snd-es8326], device 0: i2s-dai0-ES8326 HiFi ES8326 HiFi-0 []
    Subdevices: 1/1
    Subdevice #0: subdevice #0
card 2: Device [USB Audio Device], device 0: USB Audio [USB Audio]
    Subdevices: 1/1
    Subdevice #0: subdevice #0

Update the speaker configuration in the following files:

File 1: ~/jobot-ai-elephant/smart_main_asr.py

play_device='plughw:2,0'  # Speaker sittings

File 2: ~/jobot-ai-elephant/spacemit_audio/play.py
```
play_device='plughw:2,0'  # Speaker sittings
```

Step 3: Configure Recording Parameters

Set the maximum recording duration in ~/jobot-ai-elephant/smart_main_asr.py:

rec_audio.max_time_record = 3  # Maximum recording time in seconds

Understand recording modes:

By default, non-blocking mode is used for recording
To wait for recording to complete, use the join() method

# Start recording
rec_audio.max_time_record = 3
rec_audio.frame_is_append = True
rec_audio.start_recording()
rec_audio.thread.join()  # Wait for recording to complete

Run the Program

cd ~/jobot-ai-elephant
source ~/asr_env/bin/activate
python smart_main_asr.py

Usage Instructions:

Press Enter directly to enter recording mode
Default recording duration is 3 seconds
Supports fuzzy command matching (e.g., "Grab an orange", "Grab an apple", "Checkout", etc.)
Natural language commands supported by the large language model, such as:
- "Give me an apple", "One more orange", etc.
- Can recognize various object names

6.5.2 Smart Retail with Robotic Arm

Overview​

Hardware​

Software Modules​

Environment Setup​

Step 1: Download the Project Code​

Step 2: Install System Dependencies​

Step 3: Install Python Dependencies​

Step 4: Download and Deploy LLMs to Ollama​

Step 5: Set Audio Permissions​

Audio Module Configuration Guide​

Step 1: Detect and Configure Microphone​

Step 2: Detect and Configure the Speaker​

Step 3: Configure Recording Parameters​

Run the Program​