PPASR Streaming and Non-Streaming Speech Recognition

This document introduces how to deploy and test a speech recognition model implemented using PaddlePaddle, and provides various methods to execute and demonstrate the model's functionality. The following is a summary and interpretation of the document content: ### 1. Introduction - Provides an overview of PaddlePaddle-based speech recognition models, including recognition for short voice segments and long audio clips. ### 2. Deployment Methods #### 2.1 Command-line Deployment Two commands are provided to implement different deployment methods: - `python infer_server.

Read More
Processing and Usage of the WenetSpeech Dataset

The WenetSpeech dataset provides over 10,000 hours of Mandarin Chinese speech, categorized into strong-labeled (10,005 hours), weak-labeled (2,478 hours), and unlabeled (9,952 hours) subsets, suitable for supervised, semi-supervised, or unsupervised training. The data is grouped by domain and style, and datasets of different scales (S, M, L) as well as evaluation/test data are provided. The tutorial details how to download, prepare, and use this dataset for training speech recognition models, making it a valuable reference for ASR system developers.

Read More
Fast Face Recognition Model Implemented with PaddlePaddle

This project develops a small and efficient face recognition system based on the ArcFace and PP-OCRv2 models. The training dataset is emore (containing 85,742 individuals and 5,822,653 images), and the lfw-align-128 dataset is used for testing. The project provides complete code and preprocessing scripts. The `create_dataset.py` script is executed to organize raw data into binary file format, improving training efficiency. Model training and evaluation are controlled by `train.py` and `eval.py` respectively. The prediction function supports

Read More
A Fast Face Recognition Model Implemented Based on PyTorch

This project aims to develop a face recognition system with small models, high recognition accuracy, and fast inference speed. The training data is sourced from the emore dataset (5.82 million images), and the lfw-align-128 dataset is used for testing. The project combines the ArcFace loss function and MobileNet, implemented through Python scripts. The process of training the model includes data preparation, training, and evaluation, with all code available on GitHub. To start the training process, the `train.py` command is executed; for performance verification, run `ev`

Read More