Skip to main content

5.1.1 Voice Activity Detection (VAD)

Last Version: 11/09/2025

Overview

This section introduces how to use Voice Activity Detection (VAD) models to automatically detect human speech and control recording behavior. The system will

  • Start recording when speech is detected.
  • Stop recording when silence is detected.

Project repository: ⭐ Bianbu AI Demo Zoo | NLP

Preparation

Download Model Files

wget -O ~/.cache/sensevoice.tar.gz https://archive.spacemit.com/spacemit-ai/openwebui/sensevoice.tar.gz 
tar -xzf ~/.cache/sensevoice.tar.gz -C ~/.cache
rm ~/.cache/sensevoice.tar.gz

Clone Repository

git clone https://gitee.com/bianbu/spacemit-demo.git

Install Dependencies

sudo apt update
sudo apt install onnxruntime python3-spacemit-ort
sudo apt install python3-numpy
sudo apt install python3-pyaudio

Detect System Recording Devices

You need to know the correct device index for your microphone or recording device. There are two ways to find this.

Method 1: Using arecord

Run the following command to view the system's recording device list:

arecord -l

Sample output:

Record the device index number you need to use.

Method 2: Automatic Device Search Script

Alternatively, run the following script to list all recording devices:

python3 01_search_device.py

Sample output: