Foreword¶

This article will introduce a program for real-time command wake-up, which allows adding arbitrary commands. It can record audio in real-time and activate the program once a command voice is detected. Additionally, it supports command fine-tuning to improve the accuracy of commands.

Install Project Environment¶

The project was developed with:
- Anaconda 3
- Windows 11
- Python 3.11
- PyTorch 2.1.0
- CUDA 12.1

Install PyTorch by running the following command. If you already have another version installed and it works properly, you can skip this step.

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Install other dependent packages by running the following command. If any libraries are missing after installation, install them accordingly.

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Command Wake-up¶

infer_pytorch.py can perform inference using GPU. For CPU inference, use infer_onnx.py, which employs ONNX and can achieve acceleration on CPU.

Adjustable parameters include: sec_time (recording duration in seconds) and last_len (length of the previous segment in seconds).
To add a new command, simply add it to the instruct.txt file.

Sample output log:

Supported commands: ['Up', 'Down', 'Left', 'Right', 'Stop', 'Fire']
Please issue a command...
Triggered command: [Fire]
Triggered command: [Stop]

Fine-tuning the Command Model¶

The code for fine-tuning the command model is located in the finetune directory. To start fine-tuning, switch to the finetune directory and follow the training steps below.

Data Collection¶

Run the record_data.py script to start the recording program. By default, it records for 2 seconds. It is recommended to record an additional 1-second audio after that. Note that the 1-second recording is very short—press Enter immediately after the prompt to start speaking. Custom data can refer to the generated dataset directory.

Sample output log:

Please enter the command content: Up
Please enter the number of recordings: 10
Recording 1: Press Enter to start speaking:
Recording started...
Recording ended!
Recording 2: Press Enter to start speaking:

Generate Training Data List¶

Run the generate_data_list.py script to generate the training data list.

Model Training¶

Execute the following command to train the model. For Windows, concatenate the parameters into a single line and remove the \.

funasr-train \
++model=../models/paraformer-zh \
++train_data_set_list=dataset/train.jsonl \
++valid_data_set_list=dataset/validation.jsonl \
++dataset_conf.batch_type="token" \
++dataset_conf.batch_size=10000 \
++train_conf.max_epoch=5 \
++train_conf.log_interval=1 \
++train_conf.keep_nbest_models=5 \
++train_conf.avg_nbest_model=3 \
++output_dir="./outputs"

Model Merging¶

Run the merge_model.py script to merge the trained models into a single model at ../models/paraformer-zh-finetune.

Join the Knowledge Planet to Get the Source Code¶

Scan the QR code to join the knowledge planet and search for “Real-time Command Wake-up” to obtain the source code.

![](/static/files/2023-12-17/cfe1ab6365334ab296ebe12671dd7a06.png)