5.1.1 Voice Activity Detection (VAD)

Last Version: 11/09/2025

Overview

This section introduces how to use Voice Activity Detection (VAD) models to automatically detect human speech and control recording behavior. The system will

Start recording when speech is detected.
Stop recording when silence is detected.

Project repository: ⭐ Bianbu AI Demo Zoo | NLP

Preparation

Download Model Files

wget -O ~/.cache/sensevoice.tar.gz https://archive.spacemit.com/spacemit-ai/openwebui/sensevoice.tar.gz 
tar -xzf ~/.cache/sensevoice.tar.gz -C ~/.cache
rm ~/.cache/sensevoice.tar.gz

Clone Repository

git clone https://gitee.com/bianbu/spacemit-demo.git

Install Dependencies

sudo apt update
sudo apt install onnxruntime python3-spacemit-ort
sudo apt install python3-numpy
sudo apt install python3-pyaudio

Detect System Recording Devices

You need to know the correct device index for your microphone or recording device. There are two ways to find this.

Method 1: Using `arecord`

Run the following command to view the system's recording device list:

arecord -l

Sample output:

Record the device index number you need to use.

Method 2: Automatic Device Search Script

Alternatively, run the following script to list all recording devices:

python3 01_search_device.py

Sample output:

5.1.1 Voice Activity Detection (VAD)

Overview​

Preparation​

Download Model Files​

Clone Repository​

Install Dependencies​

Detect System Recording Devices​

Method 1: Using arecord​

Method 2: Automatic Device Search Script​