Skip to main content

5.2.2 Speech to Text

Feature Introduction

This section introduces the basic functionality of Automatic Speech Recognition (ASR) and its example usage. After users input voice through a microphone, the system can automatically recognize and convert it to text.

Project repository: ⭐ Bianbu AI Demo Zoo | NLP

Preparation

Clone Code

git clone https://gitee.com/bianbu/spacemit-demo.git
cd spacemit_demo/examples/NLP

Install Environment Dependencies

It is recommended to use a virtual environment for dependency isolation:

sudo apt install python3-venv

python3 -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

Detect System Recording Devices

Refer to the Recording Device Detection section to check the available recording devices in the system.

Execute Example Code

Run the ASR example:

python 03_asr_demo.py

After the program starts, press Enter to begin recording. The integrated VAD functionality will automatically determine if there is human speech and stop recording during silence.

Parameter Description

Parameter NameDescriptionUsage
sldSilence Duration Threshold (seconds)Continuous silence time ≥ sld seconds will be considered as end of speech; set to 0 to disable
max_timeMaximum Recording Time (seconds)Recording will automatically terminate when this duration is reached to avoid excessively long speech
channelsAudio Channel CountUsually set to 1 (mono), mono input is recommended for speech recognition
rateSample Rate (Hz)Number of samples per second, such as 16000 or 48000, must match model input
device_indexInput Device IndexSpecify recording device, can be obtained through arecord or search_device.py