Preface

This article mainly introduces how to quickly train and infer using the sound classification framework. This article will focus on the simplest way to introduce its usage. For more advanced features, you need to refer to the documentation from the source code. Training and inference can be achieved with just three lines of code.

Source Code Address: https://github.com/yeyupiaoling/AudioClassification-Pytorch

Installation Environment

Anaconda was used, and a virtual environment with Python 3.11 was created.

  • First, install the GPU version of PyTorch 2.5.1. If you have already installed it, you can skip this step.
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=11.8 -c pytorch -c nvidia
  • Install the sound classification macls library using pip with the following command:
python -m pip install macls -U -i https://pypi.tuna.tsinghua.edu.cn/simple

Data Preparation

The author provides a small dataset. Download Address. You can download it and use it directly, or refer to the format to generate your own data list.

The code to generate the data list is as follows. Just prepare the data folder in the format of dataset/audio/[folders for various sounds], then execute the following code to generate the data list.

import os
from sklearn.model_selection import train_test_split


def create_list(audio_dir, list_dir):
    f_list = open(os.path.join(list_dir, 'label_list.txt'), 'w', encoding='utf-8')
    f_train = open(os.path.join(list_dir, 'train_list.txt'), 'w', encoding='utf-8')
    f_test = open(os.path.join(list_dir, 'test_list.txt'), 'w', encoding='utf-8')

    audio_list = []
    for i, name in enumerate(os.listdir(audio_dir)):
        f_list.write(name + '\n')
        animal_dir = os.path.join(audio_dir, name)
        for file in os.listdir(animal_dir):
            if not file.endswith('.wav'):
                continue
            audio_path = os.path.join(animal_dir, file).replace('\\', '/')
            audio_list.append(f"{audio_path}\t{i}\n")

    train_list, test_list = train_test_split(audio_list, test_size=0.1, random_state=42)
    for line in train_list:
        f_train.write(line)
    for line in test_list:
        f_test.write(line)


if __name__ == '__main__':
    create_list(audio_dir='dataset/audio', list_dir="dataset/")

Training

Training with this framework is very simple, with only three core lines of code:

from macls.trainer import MAClsTrainer

# Get the trainer
trainer = MAClsTrainer(configs="cam++", use_gpu=True)

trainer.train(save_model_path="models/")

Sample output:

2025-03-08 11:59:19.801 | INFO     | macls.optimizer:build_optimizer:16 - Successfully created optimizer: Adam, parameters: {'lr': 0.001, 'weight_decay': 1e-05}
2025-03-08 11:59:19.801 | INFO     | macls.optimizer:build_lr_scheduler:31 - Successfully created learning rate scheduler: WarmupCosineSchedulerLR, parameters: {'min_lr': 1e-05, 'max_lr': 0.001, 'warmup_epoch': 5, 'fix_epoch': 60, 'step_per_epoch': 1}
2025-03-08 11:59:20.414 | INFO     | macls.utils.checkpoint:load_model:85 - Successfully restored model parameters and optimizer parameters: models/CAMPPlus_Fbank\last_model
2025-03-08 11:59:20.417 | INFO     | macls.trainer:train:334 - Training data: 70
Performing evaluation:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-08 11:59:31.553 | INFO     | macls.trainer:__train_epoch:277 - Train epoch: [31/60], batch: [1/1], loss: 0.05312, accuracy: 0.96875, learning rate: 0.00057545, speed: 0.72 data/sec, eta: 0:05:22
2025-03-08 11:59:31.556 | INFO     | macls.trainer:train:356 - ======================================================================
Performing evaluation: 100%|██████████| 1/1 [00:01<00:00,  1.22s/it]
2025-03-08 11:59:32.786 | INFO     | macls.trainer:train:358 - Test epoch: 31, time/epoch: 0:00:12.367833, loss: 0.00298, accuracy: 1.00000
2025-03-08 11:59:32.786 | INFO     | macls.trainer:train:360 - ======================================================================

Inference

Inference is also very simple, as follows:

from macls.predict import MAClsPredictor

# Get the predictor
predictor = MAClsPredictor(configs="cam++",
                           model_path='models/CAMPPlus_Fbank/best_model/',
                           use_gpu=True,
                           log_level="ERROR")

audio_path = "dataset/cat.wav"
label, score = predictor.predict(audio_data=audio_path)

print(f'Prediction result label: {label}, score: {score}')

Sample output:

Prediction result label: cat, score: 0.99957

Conclusion

This framework provides various sound classification models, such as EcapaTdnn, PANNS, ResNetSE, CAMPPlus, and ERes2Net, to support different application scenarios.

Xiaoye