Preface¶
This article mainly introduces how to quickly train and infer using the sound classification framework. This article will focus on the simplest way to introduce its usage. For more advanced features, you need to refer to the documentation from the source code. Training and inference can be achieved with just three lines of code.
Source Code Address: https://github.com/yeyupiaoling/AudioClassification-Pytorch
Installation Environment¶
Anaconda was used, and a virtual environment with Python 3.11 was created.
- First, install the GPU version of PyTorch 2.5.1. If you have already installed it, you can skip this step.
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=11.8 -c pytorch -c nvidia
- Install the sound classification
maclslibrary using pip with the following command:
python -m pip install macls -U -i https://pypi.tuna.tsinghua.edu.cn/simple
Data Preparation¶
The author provides a small dataset. Download Address. You can download it and use it directly, or refer to the format to generate your own data list.
The code to generate the data list is as follows. Just prepare the data folder in the format of dataset/audio/[folders for various sounds], then execute the following code to generate the data list.
import os
from sklearn.model_selection import train_test_split
def create_list(audio_dir, list_dir):
f_list = open(os.path.join(list_dir, 'label_list.txt'), 'w', encoding='utf-8')
f_train = open(os.path.join(list_dir, 'train_list.txt'), 'w', encoding='utf-8')
f_test = open(os.path.join(list_dir, 'test_list.txt'), 'w', encoding='utf-8')
audio_list = []
for i, name in enumerate(os.listdir(audio_dir)):
f_list.write(name + '\n')
animal_dir = os.path.join(audio_dir, name)
for file in os.listdir(animal_dir):
if not file.endswith('.wav'):
continue
audio_path = os.path.join(animal_dir, file).replace('\\', '/')
audio_list.append(f"{audio_path}\t{i}\n")
train_list, test_list = train_test_split(audio_list, test_size=0.1, random_state=42)
for line in train_list:
f_train.write(line)
for line in test_list:
f_test.write(line)
if __name__ == '__main__':
create_list(audio_dir='dataset/audio', list_dir="dataset/")
Training¶
Training with this framework is very simple, with only three core lines of code:
from macls.trainer import MAClsTrainer
# Get the trainer
trainer = MAClsTrainer(configs="cam++", use_gpu=True)
trainer.train(save_model_path="models/")
Sample output:
2025-03-08 11:59:19.801 | INFO | macls.optimizer:build_optimizer:16 - Successfully created optimizer: Adam, parameters: {'lr': 0.001, 'weight_decay': 1e-05}
2025-03-08 11:59:19.801 | INFO | macls.optimizer:build_lr_scheduler:31 - Successfully created learning rate scheduler: WarmupCosineSchedulerLR, parameters: {'min_lr': 1e-05, 'max_lr': 0.001, 'warmup_epoch': 5, 'fix_epoch': 60, 'step_per_epoch': 1}
2025-03-08 11:59:20.414 | INFO | macls.utils.checkpoint:load_model:85 - Successfully restored model parameters and optimizer parameters: models/CAMPPlus_Fbank\last_model
2025-03-08 11:59:20.417 | INFO | macls.trainer:train:334 - Training data: 70
Performing evaluation: 0%| | 0/1 [00:00<?, ?it/s]2025-03-08 11:59:31.553 | INFO | macls.trainer:__train_epoch:277 - Train epoch: [31/60], batch: [1/1], loss: 0.05312, accuracy: 0.96875, learning rate: 0.00057545, speed: 0.72 data/sec, eta: 0:05:22
2025-03-08 11:59:31.556 | INFO | macls.trainer:train:356 - ======================================================================
Performing evaluation: 100%|██████████| 1/1 [00:01<00:00, 1.22s/it]
2025-03-08 11:59:32.786 | INFO | macls.trainer:train:358 - Test epoch: 31, time/epoch: 0:00:12.367833, loss: 0.00298, accuracy: 1.00000
2025-03-08 11:59:32.786 | INFO | macls.trainer:train:360 - ======================================================================
Inference¶
Inference is also very simple, as follows:
from macls.predict import MAClsPredictor
# Get the predictor
predictor = MAClsPredictor(configs="cam++",
model_path='models/CAMPPlus_Fbank/best_model/',
use_gpu=True,
log_level="ERROR")
audio_path = "dataset/cat.wav"
label, score = predictor.predict(audio_data=audio_path)
print(f'Prediction result label: {label}, score: {score}')
Sample output:
Prediction result label: cat, score: 0.99957
Conclusion¶
This framework provides various sound classification models, such as EcapaTdnn, PANNS, ResNetSE, CAMPPlus, and ERes2Net, to support different application scenarios.