Skip to main content

5.2.1 Voice Activity Detection

Feature Introduction

This section introduces how to use Voice Activity Detection (VAD) models to automatically detect human speech and control recording behavior. The system will start recording when speech is detected and automatically stop when silence is detected.

Project repository: ⭐ Bianbu AI Demo Zoo | NLP

Preparation

Download Model Files

wget -O ~/.cache/sensevoice.tar.gz https://archive.spacemit.com/spacemit-ai/openwebui/sensevoice.tar.gz 
tar -xzf ~/.cache/sensevoice.tar.gz -C ~/.cache
rm ~/.cache/sensevoice.tar.gz

Clone Repository Code

git clone https://gitee.com/bianbu/spacemit-demo.git

Install Dependencies

sudo apt update
sudo apt install onnxruntime python3-spacemit-ort
sudo apt install python3-numpy
sudo apt install python3-pyaudio

Detect System Recording Devices

Method 1: Using arecord

Run the following command to view the system's recording device list:

arecord -l

Record the device index number you need to use.

Execute the following script to enumerate recording devices in the system:

python3 01_search_device.py