前言

This project is a sound classification project based on PaddlePaddle, aiming to achieve recognition of various environmental sounds, animal calls, and languages. The project provides multiple sound classification models such as EcapaTdnn, PANNS, ResNetSE, CAMPPlus, and ERes2Net to support different application scenarios. Additionally, it offers test reports for the commonly used Urbansound8K dataset and examples of downloading and using some dialect datasets. Users can select suitable models and datasets according to their needs to achieve more accurate sound classification. The project has a wide range of application scenarios, including outdoor environmental monitoring, wildlife protection, and speech recognition. Meanwhile, the project encourages users to explore more application scenarios to promote the development and application of sound classification technology.

Source code address: AudioClassification-PaddlePaddle

Preparation

  • Anaconda 3
  • Python 3.8
  • PaddlePaddle 2.4.0
  • Windows 10 or Ubuntu 18.04

Project Features

  1. Supported models: EcapaTdnn, PANNS, TDNN, Res2Net, ResNetSE
  2. Supported pooling layers: AttentiveStatisticsPooling(ASP), SelfAttentivePooling(SAP), TemporalStatisticsPooling(TSP), TemporalAveragePooling(TAP)
  3. Supported preprocessing methods: MelSpectrogram, LogMelSpectrogram, Spectrogram, MFCC, Fbank

Model papers:

Model Test Table

Model Params(M) Preprocessing Method Dataset Number of Classes Accuracy
CAMPPlus 7.2 Flank UrbanSound8K 10 0.96590
PANNS(CNN10) 4.9 Flank UrbanSound8K 10 0.95454
ResNetSE 9.1 Flank UrbanSound8K 10 0.92219
TDNN 2.7 Flank UrbanSound8K 10 0.92045
ERes2Net 6.6 Flank UrbanSound8K 10 0.90909
EcapaTdnn 6.2 Flank UrbanSound8K 10 0.90503
Res2Net 5.6 Flank UrbanSound8K 10 0.85812

Installation Environment

  • First, install the GPU version of PaddlePaddle. If you have already installed it, skip this step.
conda install paddlepaddle-gpu==2.4.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
  • Install the ppacls library.

Install using pip with the following command:

python -m pip install ppacls -U -i https://pypi.tuna.tsinghua.edu.cn/simple

Source code installation is recommended as it ensures the latest code is used.

git clone https://github.com/yeyupiaoling/AudioClassification_PaddlePaddle.git
cd AudioClassification_PaddlePaddle
python setup.py install

Data Preparation

Generate a data list for subsequent reading. The audio_path is the path to the audio files. Users need to place the audio dataset in the dataset/audio directory in advance, with each folder containing audio data of one category. Each audio data should be longer than 3 seconds, e.g., dataset/audio/鸟鸣/······. The audio is the location to store the data list. The generated data list will have the format: audio_path\tcategory_label, with the audio path and label separated by a tab. Users can modify the following function according to their data storage method.

Taking Urbansound8K as an example, it is a widely used public dataset for automatic urban environmental sound classification research, containing 10 categories: air conditioners, car horns, children playing, dog barking, drilling, engine idling, gunshots, jackhammer, siren, street music. The dataset download address: UrbanSound8K.tar.gz. For using this dataset, download and extract it to the dataset directory and modify the data list generation code as follows.

Execute create_data.py to generate the data list. The code provides two methods: one for custom data and one for generating Urbansound8K data lists, depending on the code.

python create_data.py

The generated list will look like this, with the audio path followed by its corresponding label (starting from 0), separated by a tab:

dataset/UrbanSound8K/audio/fold2/104817-4-0-2.wav   4
dataset/UrbanSound8K/audio/fold9/105029-7-2-5.wav   7
dataset/UrbanSound8K/audio/fold3/107228-5-0-0.wav   5
dataset/UrbanSound8K/audio/fold4/109711-3-2-4.wav   3

Modify Preprocessing Method

The configuration file by default uses the MelSpectrogram preprocessing method. To use other preprocessing methods, modify the configuration file as follows. Specific values can be adjusted according to your situation. If you are unsure how to set parameters, you can directly delete this part and use the default values.

preprocess_conf:
  # Audio preprocessing method, supported: MelSpectrogram, Spectrogram, MFCC, Fbank
  feature_method: 'MelSpectrogram'
  # Set API parameters. For more parameters, check the corresponding API. If unsure, you can delete this part and use default values.
  method_args:
    sample_rate: 16000
    n_fft: 1024
    hop_length: 320
    win_length: 1024
    f_min: 50.0
    f_max: 14000.0
    n_mels: 64

Training

Then you can start training the model by creating train.py. Generally, no changes are needed to the parameters in the configuration file, but the following parameters need to be adjusted according to your actual dataset:
1. The number of categories dataset_conf.num_class, which varies by dataset. Set it according to your actual situation.
2. dataset_conf.batch_size. If there is insufficient GPU memory, reduce this parameter.

# Single-card training
CUDA_VISIBLE_DEVICES=0 python train.py
# Multi-card training
python -m paddle.distributed.launch --gpus '0,1' train.py

Training output log:
```
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:14 - ----------- Additional Configuration Parameters -----------
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - configs: configs/ecapa_tdnn.yml
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - pretrained_model: None
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - resume_model: None
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - save_model_path: models/
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:16 - use_gpu: True
[2023-08-07 23:02:08.807036 INFO ] utils:print_arguments:17 - ------------------------------------------------
[2023-08-07 23:02:08.811036 INFO ] utils:print_arguments:19 - ----------- Configuration File Parameters -----------
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:22 - dataset_conf:
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - aug_conf:
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - noise_aug_prob: 0.2
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - noise_dir: dataset/noise
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - speed_perturb: True
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - volume_aug_prob: 0.2
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - volume_perturb: False
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - dataLoader:
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - batch_size: 64
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - num_workers: 4
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - do_vad: False
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - eval_conf:
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - batch_size: 1
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - max_duration: 20
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - label_list_path: dataset/label_list.txt
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - max_duration: 3
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - min_duration: 0.5
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - sample_rate: 16000
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:25 - spec_aug_args:
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - freq_mask_width: [0, 8]
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:27 - time_mask_width: [0, 10]
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - target_dB: -20
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - test_list: dataset/test_list.txt
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - train_list: dataset/train_list.txt
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - use_dB_normalization: True
[2023-08-07 23:02:08.812035 INFO ] utils:print_arguments:29 - use_spec_aug: True
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:22 - model_conf:
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - num_class: 10
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - pooling_type: ASP
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:22 - optimizer_conf:
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - optimizer: Adam
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - scheduler: WarmupCosineSchedulerLR
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:25 - scheduler_args:
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - learning_rate: 0.001
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - min_lr: 1e-05
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:27 - warmup_epoch: 5
[2023-08-07 23:02:08.816062 INFO ] utils:print_arguments:29 - weight_decay: 1e-06
[2023

Xiaoye