Foreword¶
This article will introduce a program for real-time command wake-up, which allows adding arbitrary commands. It can record audio in real-time and activate the program once a command voice is detected. Additionally, it supports command fine-tuning to improve the accuracy of commands.
Install Project Environment¶
The project was developed with:
- Anaconda 3
- Windows 11
- Python 3.11
- PyTorch 2.1.0
- CUDA 12.1
- Install PyTorch by running the following command. If you already have another version installed and it works properly, you can skip this step.
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
- Install other dependent packages by running the following command. If any libraries are missing after installation, install them accordingly.
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
Command Wake-up¶
infer_pytorch.py can perform inference using GPU. For CPU inference, use infer_onnx.py, which employs ONNX and can achieve acceleration on CPU.
- Adjustable parameters include:
sec_time(recording duration in seconds) andlast_len(length of the previous segment in seconds). - To add a new command, simply add it to the
instruct.txtfile.
Sample output log:
Supported commands: ['Up', 'Down', 'Left', 'Right', 'Stop', 'Fire']
Please issue a command...
Triggered command: [Fire]
Triggered command: [Stop]
Fine-tuning the Command Model¶
The code for fine-tuning the command model is located in the finetune directory. To start fine-tuning, switch to the finetune directory and follow the training steps below.
Data Collection¶
Run the record_data.py script to start the recording program. By default, it records for 2 seconds. It is recommended to record an additional 1-second audio after that. Note that the 1-second recording is very short—press Enter immediately after the prompt to start speaking. Custom data can refer to the generated dataset directory.
Sample output log:
Please enter the command content: Up
Please enter the number of recordings: 10
Recording 1: Press Enter to start speaking:
Recording started...
Recording ended!
Recording 2: Press Enter to start speaking:
Generate Training Data List¶
Run the generate_data_list.py script to generate the training data list.
Model Training¶
Execute the following command to train the model. For Windows, concatenate the parameters into a single line and remove the \.
funasr-train \
++model=../models/paraformer-zh \
++train_data_set_list=dataset/train.jsonl \
++valid_data_set_list=dataset/validation.jsonl \
++dataset_conf.batch_type="token" \
++dataset_conf.batch_size=10000 \
++train_conf.max_epoch=5 \
++train_conf.log_interval=1 \
++train_conf.keep_nbest_models=5 \
++train_conf.avg_nbest_model=3 \
++output_dir="./outputs"
Model Merging¶
Run the merge_model.py script to merge the trained models into a single model at ../models/paraformer-zh-finetune.
Join the Knowledge Planet to Get the Source Code¶
Scan the QR code to join the knowledge planet and search for “Real-time Command Wake-up” to obtain the source code.