Foreword

This article will introduce a highly accurate speech recognition framework called FunASR. The model training data for this framework exceeds tens of thousands of hours, and through testing, its accuracy is extremely high. This article will explain how to start a WebSocket service and how an Android application can call this service for real-time recognition, where results are generated as you speak.

Installation Environment

  1. Install PyTorch.
# Install CPU version of PyTorch
conda install pytorch torchvision torchaudio cpuonly -c pytorch
# Install GPU version of PyTorch (adjust pytorch-cuda version as needed)
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
  1. Install FFmpeg and other libraries using Conda.
conda install ffmpeg
conda install -c conda-forge pynini
  1. Install other dependency libraries.
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Starting the Service

  1. Execute the server.py program to start the audio file upload recognition service.
python server.py

Python does not support multi-threading for concurrency. To use a multi-concurrent service, you need to run the Docker application in the /websocket directory on a Linux system.

Android Application

Open the AndroidClient directory in the source code using Android Studio. This is an Android application source code. After opening it, you first need to modify the WebSocket address ASR_HOST to the IP address of the server you used above. Then click “Run” to install it on an Android phone.

Application effect image:

![](/static/files/2023-10-21/5189f2799ca54d73b829324740043783.gif)

Scan the QR code to join the knowledge planet and search for “FunASR Speech Recognition WebSocket Service” to obtain the source code

![](/static/files/2023-10-21/39018ff6d9dd4e9aada105ec5685fbf5.png)
Xiaoye