5.1.1 Voice Activity Detection (VAD)
Last Version: 11/09/2025
Overview
This section introduces how to use Voice Activity Detection (VAD) models to automatically detect human speech and control recording behavior. The system will
- Start recording when speech is detected.
- Stop recording when silence is detected.
Project repository: ⭐ Bianbu AI Demo Zoo | NLP
Preparation
Download Model Files
wget -O ~/.cache/sensevoice.tar.gz https://archive.spacemit.com/spacemit-ai/openwebui/sensevoice.tar.gz
tar -xzf ~/.cache/sensevoice.tar.gz -C ~/.cache
rm ~/.cache/sensevoice.tar.gz
Clone Repository
git clone https://gitee.com/bianbu/spacemit-demo.git
Install Dependencies
sudo apt update
sudo apt install onnxruntime python3-spacemit-ort
sudo apt install python3-numpy
sudo apt install python3-pyaudio
Detect System Recording Devices
You need to know the correct device index for your microphone or recording device. There are two ways to find this.
Method 1: Using arecord
Run the following command to view the system's recording device list:
arecord -l
Sample output:
Record the device index number you need to use.
Method 2: Automatic Device Search Script
Alternatively, run the following script to list all recording devices:
python3 01_search_device.py
Sample output: