Introduction and Usage of YeAudio Audio Tool

YeAudio 音频处理库使用指南¶

YeAudio 是一个轻量级、高性能的音频处理 Python 库，专注于语音识别、语音合成等场景的音频预处理和数据增强。以下是其核心功能和使用方法：

一、核心组件¶

AudioSegment：基础音频处理类，支持音频加载、裁剪、格式转换等。
VadModel/VadOnlineModel：语音活动检测（VAD）模型，用于提取语音段。
Augmentor 系列：音频数据增强工具，包括速度、音量、噪声、混响等扰动。

二、安装¶

pip install yeaudio

三、AudioSegment：基础音频处理¶

AudioSegment 是处理音频的核心类，支持多种格式的音频文件（WAV、MP3、MP4 等）。

1. 加载音频¶

from yeaudio.audio import AudioSegment

# 从文件加载
audio = AudioSegment.from_file("data/test.wav")
# 从 numpy 数组加载
import numpy as np
samples, sr = np.load("data/speech.npy"), 16000
audio = AudioSegment.from_ndarray(samples, sr)

2. 音频操作¶

# 切片（高效加载部分音频）
audio_slice = AudioSegment.slice_from_file("data/test.wav", start=1, end=3)  # 1-3秒

# 连接音频
audio1 = AudioSegment.from_file("a.wav")
audio2 = AudioSegment.from_file("b.wav")
combined = AudioSegment.concatenate(audio1, audio2)

# 重采样
audio_resampled = audio.resample(target_sample_rate=8000)

# 添加噪声（需准备噪声文件）
audio_with_noise = audio.add_noise(noise_file="noise.wav", snr_dB=10)

# 混响处理
audio_with_reverb = audio.reverb(reverb_file="reverb.wav")

# 静默填充
audio_padded = audio.pad_silence(duration=2, sides="end")  # 结尾添加2秒静音

3. 音频特征与导出¶

print("采样率:", audio.sample_rate)       # 16000 Hz
print("时长:", audio.duration)           # 5.0 秒
print("样本数:", audio.num_samples)      # 80000
print("均方根值(dB):", audio.rms_db)     # -18.5 dB

# 导出为 WAV
audio.to_wav_file("output.wav", dtype="int16")

四、语音活动检测（VAD）¶

1. 非流式 VAD（VadModel）¶

from yeaudio.vad_model import VadModel

vad = VadModel()  # 加载模型（默认 CPU，可指定 device_id=0 用 GPU）
audio = AudioSegment.from_file("data/long_speech.wav")
audio_resampled = audio.resample(target_sample_rate=vad.sample_rate)  # 确保采样率匹配
speech_timestamps = vad(audio_resampled.samples)  # 输出格式：[[start_ms, end_ms], ...]

2. 流式 VAD（VadOnlineModel）¶

from yeaudio.vad_model import VadOnlineModel

vad = VadOnlineModel()
audio = AudioSegment.from_file("data/long_speech.wav")
audio_resampled = audio.resample(target_sample_rate=vad.sample_rate)
samples = audio_resampled.samples

step = 16000  # 每步处理 1 秒（16000 样本）
for i in range(0, len(samples), step):
    batch = samples[i:i+step]
    result = vad(batch)  # 实时处理
    if result:
        print(f"检测到语音: {result}")

五、数据增强工具¶

YeAudio 提供多种增强器，用于数据扩充（如训练集增强）。

1. 速度扰动（SpeedPerturbAugmentor）¶

from yeaudio.augmentors import SpeedPerturbAugmentor

speed_aug = SpeedPerturbAugmentor(prob=0.8, speed_perturb_3_class=True)  # 三类速度增强
augmented_audio = speed_aug(audio)

2. 音量扰动（VolumePerturbAugmentor）¶

from yeaudio.augmentors import VolumePerturbAugmentor

volume_aug = VolumePerturbAugmentor(prob=0.5, min_gain_dBFS=-15, max_gain_dBFS=15)
augmented_audio = volume_aug(audio)

3. 噪声扰动（NoisePerturbAugmentor）¶

from yeaudio.augmentors import NoisePerturbAugmentor

noise_aug = NoisePerturbAugmentor(
    noise_dir="noise_dataset/", 
    prob=0.7, 
    min_snr_dB=10, 
    max_snr_dB=50
)
augmented_audio = noise_aug(audio)

4. 频域增强（SpecAugmentor）¶

from yeaudio.augmentors import SpecAugmentor

spec_aug = SpecAugmentor(
    prob=0.5, 
    freq_mask_ratio=0.15, 
    n_freq_masks=2, 
    time_mask_ratio=0.05, 
    n_time_masks=2
)
# 先提取频谱特征（需配合 librosa 或 torchaudio）
import librosa
spec = librosa.stft(audio.samples, n_fft=512)  # (freq, time)
spec_augmented = spec_aug(spec)  # 增强后频谱

六、关键 API 速查表¶

方法	功能	参数
`AudioSegment.from_file`	从文件加载音频	`file`（路径或文件对象）
`AudioSegment.resample`	重采样音频	`target_sample_rate`（目标采样率）
`AudioSegment.add_noise`	添加噪声	`noise_file`（噪声文件）, `snr_dB`（信噪比）
`VadModel`/`VadOnlineModel`	语音活动检测	`device_id`（GPU 0 或 CPU）
`SpeedPerturbAugmentor`	速度扰动	`prob`（概率）, `speed_perturb_3_class`（三类速度）
`SpecAugmentor`	频谱增强（时间/频率掩蔽）	`freq_mask_ratio`, `time_mask_ratio`

七、常见问题¶

Q：如何处理音频时长？
A：使用 slice_from_file(start=1, end=3) 高效截取片段。
Q：如何处理流式音频（如实时语音）？
A：使用 VadOnlineModel，按块处理音频并累积结果。
Q：数据增强后音频格式是否会变？
A：所有增强方法均保持音频格式为 AudioSegment，可直接导出。

八、项目应用¶

语音识别：用 VadModel 提取语音段，SpecAugmentor 增强频谱特征。
声纹识别：用 SpeedPerturbAugmentor 生成三类语速数据。
语音合成：用 AudioSegment.pad_silence 调整音频时长。

更多细节可参考官方文档。

YeAudio 音频处理库使用指南¶

一、核心组件¶

二、安装¶

三、AudioSegment：基础音频处理¶

1. 加载音频¶

2. 音频操作¶

3. 音频特征与导出¶

四、语音活动检测（VAD）¶

1. 非流式 VAD（VadModel）¶

2. 流式 VAD（VadOnlineModel）¶

五、数据增强工具¶

1. 速度扰动（SpeedPerturbAugmentor）¶

2. 音量扰动（VolumePerturbAugmentor）¶

3. 噪声扰动（NoisePerturbAugmentor）¶

4. 频域增强（SpecAugmentor）¶

六、关键 API 速查表¶

七、常见问题¶

八、项目应用¶

Related Articles