5.1.2 Speech to Text (ASR)

Last Version: 11/09/2025

Overview

This guide explains how to use Automatic Speech Recognition (ASR) to convert spoken words into text. The process involves capturing audio from a microphone and using a model to transcribe it automatically.

Project repository: ⭐ Bianbu AI Demo Zoo | NLP

Preparation

Clone Code

Clone the repository and navigate to the correct directory:

git clone https://gitee.com/bianbu/spacemit-demo.git
cd spacemit-demo/examples/NLP

Install Environment Dependencies

It is recommended to use a virtual environment for dependency isolation:

# Install the virtual environment package
sudo apt install python3-venv

# Create and activate the virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install project dependencies
pip install -r requirements.txt

Detect System Recording Devices

Follow the instructions in the Detect System Recording Devices section to check the available recording devices in the system.

Execute Example Code

Run the ASR example:

python 03_asr_demo.py

After the program starts, press Enter to begin recording. The integrated VAD functionality will automatically determine if there is human speech and stop recording during silence. The program will start and wait for your command.

Press Enter to begin recording.
The built-in Voice Activity Detection (VAD) will automatically detect speech and stop during silence.

Parameter Description

Parameter Name	Description	Usage
`sld`	Silence Duration Threshold (seconds)	Speech ends if silence lasts ≥ `sld` seconds; - Set to `0` to disable
`max_time`	Maximum Recording Time (seconds)	Automatically stops recording after this duration to prevent long recordings.
`channels`	Audio Channel Count	Usually set to `1` (mono). - mono input is recommended for speech recognition
`rate`	Sample Rate (Hz)	Number of samples per second, e.g., `16000` or `48000`. - Must match the model input
`device_index`	Input Device Index	Specify recording device. - Find index using `arecord` or `search_device.py`

5.1.2 Speech to Text (ASR)

Overview​

Preparation​

Clone Code​

Install Environment Dependencies​

Detect System Recording Devices​

Execute Example Code​

Parameter Description​