夜雨飄零音頻工具¶

這款Python音頻處理工具功能強大，支持讀取多種格式的音頻文件。它不僅能夠對音頻進行裁剪、添加混響、添加噪聲等多種處理操作，還廣泛應用於語音識別、語音合成、聲音分類以及聲紋識別等多個項目領域。

安裝¶

使用pip安裝。

pip install yeaudio -U -i https://pypi.tuna.tsinghua.edu.cn/simple

（推薦） 使用源碼安裝。

git clone https://github.com/yeyupiaoling/YeAudio.git
cd YeAudio
pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple

快速使用¶

讀取普通音頻：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file('data/test.wav')
print(f'音頻長度：{audio_segment.duration}')
print(f'音頻採樣率：{audio_segment.sample_rate}')
print(f'音頻數據：{audio_segment.samples}')

讀取視頻中的音頻：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file('data/test.mp4')
print(f'音頻長度：{audio_segment.duration}')
print(f'音頻採樣率：{audio_segment.sample_rate}')
print(f'音頻數據：{audio_segment.samples}')

API文檔¶

AudioSegment
VadModel
VadOnlineModel
SpeedPerturbAugmentor
VolumePerturbAugmentor
ShiftPerturbAugmentor
ResampleAugmentor
NoisePerturbAugmentor
ReverbPerturbAugmentor
SpecAugmentor
SpecSubAugmentor

AudioSegment¶

基礎音頻工具，支持讀取多種格式的音頻文件，已經各種基礎操作，如裁剪、添加混響、添加噪聲等。

def __init__(self, samples, sample_rate):

創建單通道音頻片段實例

參數：

samples（ndarray.float32）： 頻數據，維度爲[num_samples x num_channels]
sample_rate（int）： 音頻的採樣率

示例代碼：

import soundfile
from yeaudio.audio import AudioSegment

samples, sample_rate = soundfile.read("data/test.wav")
audio_segment = AudioSegment(samples, sample_rate)
print(audio_segment.samples)

def __eq__(self, other):

返回兩個對象是否相等

參數：

other（AudioSegment）： 比較的另一個音頻片段實例

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment1 = AudioSegment.from_file("data/test.wav")
audio_segment2 = AudioSegment.from_file("data/test.wav")
print(audio_segment1 == audio_segment2)

def __ne__(self, other):

返回兩個實例是否不相等

參數：

other（AudioSegment）： 比較的另一個音頻片段實例

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment1 = AudioSegment.from_file("data/test.wav")
audio_segment2 = AudioSegment.from_file("data/test.wav")
print(audio_segment1 != audio_segment2)

def __str__(self):

返回該音頻的信息

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
print(str(audio_segment))

@classmethod

def from_file(cls, file):

從音頻文件創建音頻段，支持wav、mp3、mp4等多種音頻格式

參數：

file（str|BufferedReader）： 件路徑，或者文件對象

返回：

AudioSegment：音頻片段實例

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file('data/test.wav')
print(audio_segment.samples)

@classmethod

def slice_from_file(cls, file, start=None, end=None):

只加載一小段音頻，而不需要將整個文件加載到內存中，這是非常浪費的。

參數：

file（str|file）： 輸入音頻文件路徑或文件對象
start（float）： 開始時間，單位爲秒。如果start是負的，則它從末尾開始計算。如果沒有提供，這個函數將從最開始讀取。
end（float）： 結束時間，單位爲秒。如果end是負的，則它從末尾開始計算。如果沒有提供，默認的行爲是讀取到文件的末尾。

返回：

AudioSegment：AudioSegment輸入音頻文件的指定片的實例

異常：

ValueError：如果開始或結束的設定不正確，則會拋出ValueError異常

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.slice_from_file('data/test.wav', start=1, end=2)
print(audio_segment.samples)

@classmethod

def from_bytes(cls, data):

從wav格式的音頻字節創建音頻段

參數：

data（bytes）： 包含音頻樣本的字節

返回：

AudioSegment：音頻片段實例

示例代碼：

from yeaudio.audio import AudioSegment

with open('data/test.wav', 'rb') as f:
    data = f.read()
    audio_segment = AudioSegment.from_bytes(data)
    print(audio_segment.samples)

@classmethod

def from_pcm_bytes(cls, data, channels=1, samp_width=2, sample_rate=16000):

從包含無格式PCM音頻的字節創建音頻

參數：

data（bytes）： 包含音頻樣本的字節
channels（int）： 音頻的通道數
samp_width（int）： 頻採樣的寬度，如np.int16爲2
sample_rate（int）： 音頻樣本採樣率

返回：

AudioSegment：音頻片段實例

示例代碼：

from yeaudio.audio import AudioSegment

with open('data/test.wav', 'rb') as f:
    data = f.read()
    audio_segment = AudioSegment.from_pcm_bytes(data[44:], channels=1, samp_width=2, sample_rate=16000)
    print(audio_segment.samples)

@classmethod

def from_ndarray(cls, data, sample_rate=16000):

從numpy.ndarray創建音頻段

參數：

data（bytes）： numpy.ndarray類型的音頻數據
sample_rate（int）： 音頻樣本採樣率

返回：

AudioSegment：音頻片段實例

示例代碼：

import soundfile

from yeaudio.audio import AudioSegment

samples, sample_rate = soundfile.read('data/test.wav')
audio_segment = AudioSegment.from_ndarray(samples, sample_rate=16000)
print(audio_segment.samples)

@classmethod

def concatenate(cls, *segments):

將任意數量的音頻片段連接在一起

參數：

segments（AudioSegment）： 輸入音頻片段被連接

返回：

AudioSegment：音頻片段實例

異常：

ValueError：如果音頻實例列表爲空或者採樣率不一致，則會拋出ValueError異常
TypeError：如果輸入的片段類型不一致，則會拋出TypeError異常

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment1 = AudioSegment.from_file('data/test.wav')
audio_segment2 = AudioSegment.from_file('data/test.wav')
audio_segment = AudioSegment.concatenate(audio_segment1, audio_segment2)
print(audio_segment.samples)

@classmethod

def make_silence(cls, duration, sample_rate):

創建給定持續時間和採樣率的靜音音頻段

參數：

duration（float）： 靜音的時間，以秒爲單位
sample_rate（int）： 音頻採樣率

返回：

AudioSegment：給定持續時間的靜音AudioSegment實例

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.make_silence(duration=10, sample_rate=16000)
print(audio_segment.samples)

def to_wav_file(self, filepath, dtype=’float32’):

保存音頻段到磁盤爲wav文件

參數：

filepath（str|file）： WAV文件路徑或文件對象，以保存音頻段
dtype（str）： 音頻數據類型，可選: ‘int16’, ‘int32’, ‘float32’, ‘float64’

異常：

TypeError：如果類型不支持，則會拋出TypeError異常

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.to_wav_file("output.wav")

def superimpose(self, other):

將另一個段的樣本添加到這個段的樣本中(以樣本方式添加，而不是段連接)。

參數：

other（AudioSegments）： WAV文件路徑或文件對象，以保存音頻段
dtype（str）： 音頻數據類型，可選: ‘int16’, ‘int32’, ‘float32’, ‘float64’

異常：

ValueError：如果兩段音頻採樣率或者長度不一致，則會拋出ValueError異常
TypeError：如果兩個片段的類型不匹配，則會拋出TypeError異常

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
other_segment = AudioSegment.from_file("data/test.wav")
audio_segment.superimpose(other_segment)

def to_bytes(self, dtype=’float32’):

創建音頻內容的字節

參數：

dtype（str）： 導出樣本的數據類型。可選: ‘int16’, ‘int32’, ‘float32’, ‘float64’

返回：

bytes：音頻內容的字節

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
print(audio_segment.to_bytes())

def to_pcm_bytes(self):

創建pcm格式的字節

返回：

bytes：pcm格式的字節

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
print(audio_segment.to_pcm_bytes())

def to(self, dtype=’int16’):

類型轉換

參數：

dtype（str）： 導出樣本的數據類型。可選: ‘int16’, ‘int32’, ‘float32’, ‘float64’

返回：

str：轉換後的數據

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
print(audio_segment.to(dtype='int16'))

def gain_db(self, gain):

對音頻施加分貝增益。

參數：

gain（float|1darray）： 用於樣品的分貝增益

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.gain_db(gain=-20)
print(audio_segment.samples)

def change_speed(self, speed_rate):

通過線性插值改變音頻速度。

參數：

speed_rate（float）： 修改的音頻速率: speed_rate > 1.0, 加快音頻速度; speed_rate = 1.0, 音頻速度不變; speed_rate < 1.0, 減慢音頻速度; speed_rate <= 0.0, 錯誤數值.

異常：

ValueError：如果速度速率小於或等於0，則引發ValueError

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.change_speed(speed_rate=1.2)
print(audio_segment.samples)

def normalize(self, target_db=-20, max_gain_db=300.0):

將音頻歸一化，使其具有所需的有效值(以分貝爲單位)。

參數：

target_db（float）： 目標均方根值，單位爲分貝。這個值應該小於0.0，因爲0.0是全尺寸音頻。
max_gain_db（float）： 最大允許的增益值，單位爲分貝，這是爲了防止在對全0信號進行歸一化時出現Nan值。

異常：

ValueError：如果所需的增益大於max_gain_db，則引發ValueError

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.normalize(target_db=-20)
print(audio_segment.samples)

def resample(self, target_sample_rate, filter=’kaiser_best’):

按目標採樣率重新採樣音頻。

參數：

target_sample_rate（int）： 目標均方根值，單位爲分貝。這個值應該小於0.0，因爲0.0是全尺寸音頻。
filter（str）： 使用的重採樣濾波器，支持’kaiser_best’、’kaiser_fast’

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.resample(target_sample_rate=8000)
print(audio_segment.samples)

def pad_silence(self, duration, sides=’both’):

在這個音頻樣本上加一段靜音。

參數：

duration（float）： 靜默段的持續時間(以秒爲單位)
sides（str）： 添加的位置: ‘beginning’ - 在開始位置前增加靜音段; ‘end’ - 在結束位置增加靜音段; ‘both’ - 在開始和結束位置都增加靜音段.。

異常：

ValueError：如果sides的值不是beginning、end或both，則引發ValueError

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.pad_silence(duration=2, sides='end')
print(audio_segment.samples)

def pad(self, pad_width, mode=’wrap’, **kwargs):

在這個音頻樣本上加一段音頻，等同numpy.pad。

參數：

pad_width（sequence|array_like|int）： 填充寬度
sides（str|function|optional）： 填充模式

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.pad(pad_width=(0, 16000 * 2), mode='wrap')
print(audio_segment.samples)

def shift(self, shift_ms):

音頻偏移。如果shift_ms爲正，則隨時間提前移位;如果爲負，則隨時間延遲移位。填補靜音以保持持續時間不變。

參數：

shift_ms（float）： 偏移時間。如果是正的，隨時間前進；如果負，延時移位。

異常：

ValueError：如果shift_ms的絕對值大於音頻持續時間，則引發ValueError

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.shift(shift_ms=1000)
print(audio_segment.samples)

def subsegment(self, start_sec=None, end_sec=None):

在給定的邊界之間切割音頻片段。

參數：

start_sec（float）： 開始裁剪的位置，以秒爲單位，默認爲0。
end_sec（float）： 結束裁剪的位置，以秒爲單位，默認爲音頻長度。

異常：

ValueError：如果start_sec或end_sec的值越界，則引發ValueError

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.subsegment(start_sec=1, end_sec=3)
print(audio_segment.samples)

def random_subsegment(self, duration):

隨機剪切指定長度的音頻片段。

參數：

duration（float）： 隨機裁剪的片段長度，以秒爲單位

異常：

ValueError：如果片段長度大於原始段，則引發ValueError

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.random_subsegment(duration=2)
print(audio_segment.samples)

def reverb(self, reverb_file, allow_resample=True):

使音頻片段混響。

參數：

reverb_file（str）： 混響音頻的路徑
allow_resample（bool）： 指示是否允許在兩個音頻段具有不同的採樣率時重採樣

異常：

ValueError：如果兩個音頻段之間的採樣率不匹配，則引發ValueError

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.reverb(reverb_file='data/reverb.wav')
print(audio_segment.samples)

def reverb_and_normalize(self, reverb_file, allow_resample=True):

使音頻片段混響，然後歸一化。

參數：

reverb_file（str）： 混響音頻的路徑
allow_resample（bool）： 指示是否允許在兩個音頻段具有不同的採樣率時重採樣

異常：

ValueError：如果兩個音頻段之間的採樣率不匹配，則引發ValueError

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.reverb_and_normalize(reverb_file='data/reverb.wav')
print(audio_segment.samples)

def add_noise(self, noise_file, snr_dB, max_gain_db=300.0, allow_resample=True):

以特定的信噪比添加給定的噪聲段。如果噪聲段比該噪聲段長，則從該噪聲段中採樣匹配長度的隨機子段。

參數：

noise_file（str）： 噪聲音頻的路徑
snr_dB（float）： 信噪比，單位爲分貝
max_gain_db（float）： 最大允許的增益值，單位爲分貝，這是爲了防止在對全0信號進行歸一化時出現Nan
allow_resample（bool）： 指示是否允許在兩個音頻段具有不同的採樣率時重採樣

異常：

ValueError：如果兩個音頻段之間的採樣率不匹配，則引發ValueError

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.add_noise(noise_file='data/noise.wav', snr_dB=10)
print(audio_segment.samples)

def crop(self, duration, mode=’eval’):

根據模式裁剪指定的音頻長度，如果爲’train’模式，則隨機剪切，否則從末尾剪切。

參數：

duration（float）： 裁剪的音頻長度，以秒爲單位
mode（str）： 裁剪的模型，’train’或’eval’

異常：

ValueError：如果兩個音頻段之間的採樣率不匹配，則引發ValueError

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test.wav")
audio_segment.crop(duration=3, mode='train')
print(audio_segment.samples)

def vad(self, return_seconds=False, **kwargs):

創建給定持續時間和採樣率的靜音音頻段。

參數：

return_seconds（bool）： 指示是否返回秒數而不是樣本索引
kwargs（dict）： 傳遞給Silero VAD模型的參數

返回：

List[Dict]：語音活動時間戳列表

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file("data/test_long.wav")
speech_timestamps = audio_segment.vad(return_seconds=True)
for speech_timestamp in speech_timestamps:
    print(speech_timestamp)

@property

def samples(self):

返回音頻樣本

返回：

float：返回音頻樣本

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file('data/test.wav')
print(audio_segment.samples)

@property

def sample_rate(self):

返回音頻採樣率

返回：

int：返回音頻採樣率

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file('data/test.wav')
print(audio_segment.sample_rate)

@property

def num_samples(self):

返回樣品數量

返回：

int：返回樣品數量

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file('data/test.wav')
print(audio_segment.num_samples)

@property

def duration(self):

返回音頻持續時間，以秒爲單位

返回：

float：返回音頻持續時間，以秒爲單位

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file('data/test.wav')
print(audio_segment.duration)

@property

def rms_db(self):

返回以分貝爲單位的音頻均方根能量

返回：

float：返回以分貝爲單位的音頻均方根能量

示例代碼：

from yeaudio.audio import AudioSegment

audio_segment = AudioSegment.from_file('data/test.wav')
print(audio_segment.rms_db)

VadModel¶

語音活動檢測模型，非流式

def __init__(self,
batch_size: int = 1,
device_id: Union[str, int] = “-1”,
quantize: bool = True,
intra_op_num_threads: int = 4,
max_end_sil: int = None):

參數：

batch_size (int, optional): 批處理大小，默認爲1。
device_id (Union[str, int], optional): 設備ID，用於指定模型運行的設備，默認爲”-1”表示使用CPU。如果指定爲GPU，則爲GPU的ID。
quantize (bool, optional): 是否使用量化模型，默認爲False。
intra_op_num_threads (int, optional): ONNX Runtime的線程數，默認爲4。
max_end_sil (int, optional): 最大靜默結束時間，如果未指定，則使用模型配置中的默認值。

def __call__(self, audio_in: Union[np.ndarray, List[np.ndarray]]) -> List:

參數：

audio_in (Union[np.ndarray, List[np.ndarray]]): 輸入音頻數據，可以是單個numpy數組或numpy數組列表，採樣率爲16000

返回：

List: 返回結構爲[[開始, 結束],[開始, 結束]…]，如果是-1，則包含該位置，如果爲[]，沒有檢測到活動事件，單位毫秒

示例代碼：

from yeaudio.audio import AudioSegment
from yeaudio.vad_model import VadModel

vad_model = VadModel()
audio_segment = AudioSegment.from_file("data/test_long.wav")
audio_segment.resample(target_sample_rate=vad_model.sample_rate)
samples = audio_segment.samples

speech_timestamps = vad_model(samples)
for speech_timestamp in speech_timestamps:
    print(speech_timestamp)

VadOnlineModel¶

語音活動檢測模型，在線，或者叫流式

def __init__(self,
batch_size: int = 1,
device_id: Union[str, int] = “-1”,
quantize: bool = True,
intra_op_num_threads: int = 4,
max_end_sil: int = None):

參數：

batch_size (int, optional): 批處理大小，默認爲1。
device_id (Union[str, int], optional): 設備ID，用於指定模型運行的設備，默認爲”-1”表示使用CPU。如果指定爲GPU，則爲GPU的ID。
quantize (bool, optional): 是否使用量化模型，默認爲False。
intra_op_num_threads (int, optional): ONNX Runtime的線程數，默認爲4。
max_end_sil (int, optional): 最大靜默結束時間，如果未指定，則使用模型配置中的默認值。

def __call__(self, audio_in: np.ndarray) -> List:

參數：

audio_in (np.ndarray): 輸入音頻數據，採樣率爲16000

返回：

List: 返回結構爲[[開始, 結束],[開始, 結束]…]，如果是-1，則包含該位置，如果爲[]，沒有檢測到活動事件，單位毫秒

示例代碼：

from yeaudio.audio import AudioSegment
from yeaudio.vad_model import VadOnlineModel

vad_model = VadOnlineModel()

audio_segment = AudioSegment.from_file('data/test_long.wav')
audio_segment.resample(target_sample_rate=vad_model.sample_rate)
samples = audio_segment.samples

speech_length = len(samples)
step = 16000
param_dict = {"in_cache": []}
for sample_offset in range(0, speech_length, step):
    is_final = True if sample_offset + step >= speech_length - 1 else False
    data = samples[sample_offset:sample_offset + step]
    param_dict["is_final"] = is_final
    segments_result = vad_model(audio_in=data, param_dict=param_dict)
    if len(segments_result) > 0:
        print("segments_result", segments_result)

SpeedPerturbAugmentor¶

隨機語速擾動的音頻數據增強器

def __init__(self, prob=1.0, speed_perturb_3_class=False, num_speakers=None):

參數：

prob（float）： 數據增強概率
speed_perturb_3_class（bool）： 是否使用語速三類語速增強，只在聲紋識別項目上使用
num_speakers（int）： 說話人數量，只在聲紋識別項目上使用

def __call__(self, audio_segment: AudioSegment, spk_id: int = None) -> AudioSegment or [AudioSegment, int]:

參數：

audio_segment： AudioSegment實例

VolumePerturbAugmentor¶

隨機音量擾動的音頻數據增強器

def __init__(self, prob=0.0, min_gain_dBFS=-15, max_gain_dBFS=15):

參數：

prob（float）： 數據增強概率
min_gain_dBFS（int）： 最小音量，單位爲分貝。
max_gain_dBFS（int）： 最大音量，單位爲分貝。

def __call__(self, audio_segment: AudioSegment) -> AudioSegment:

參數：

audio_segment： AudioSegment實例

ShiftPerturbAugmentor¶

添加隨機位移擾動的音頻數增強器

def __init__(self, prob=0.0, min_shift_ms=-15, max_shift_ms=15):

參數：

prob（float）： 數據增強概率
min_shift_ms（int）： 最小偏移，單位爲毫秒。
max_shift_ms（int）： 最大偏移，單位爲毫秒。

def __call__(self, audio_segment: AudioSegment) -> AudioSegment:

參數：

audio_segment： AudioSegment實例

ResampleAugmentor¶

隨機重採樣的音頻數據增強器

def __init__(self, prob=0.0, new_sample_rate=(8000, 16000, 24000)):

參數：

prob（float）： 數據增強概率
new_sample_rate（list）： 新採樣率列表

def __call__(self, audio_segment: AudioSegment) -> AudioSegment:

參數：

audio_segment： AudioSegment實例

NoisePerturbAugmentor¶

隨機噪聲擾動的音頻數據增強器

def __init__(self, noise_dir=’‘, prob=0.0, min_snr_dB=10, max_snr_dB=50):

參數：

noise_dir（str）： 噪聲文件夾路徑，該文件夾下是噪聲音頻文件
prob（float）： 數據增強概率
min_snr_dB（int）： 最小信噪比
max_snr_dB（int）： 最大信噪比

def __call__(self, audio_segment: AudioSegment) -> AudioSegment:

參數：

audio_segment： AudioSegment實例

ReverbPerturbAugmentor¶

隨機混響的音頻數據增強器

def __init__(self, reverb_dir=’‘, prob=0.0):

參數：

reverb_dir（str）： 混響文件夾路徑，該文件夾下是噪聲音頻文件
prob（float）： 數據增強概率

def __call__(self, audio_segment: AudioSegment) -> AudioSegment:

參數：

audio_segment： AudioSegment實例

SpecAugmentor¶

頻域掩蔽和時域掩蔽的音頻特徵數據增強器

論文：https://arxiv.org/abs/1904.08779

論文：https://arxiv.org/abs/1912.05533

def __init__(self, prob=0.0,
freq_mask_ratio=0.15,
n_freq_masks=2,
time_mask_ratio=0.05,
n_time_masks=2,
inplace=True,
max_time_warp=5,
replace_with_zero=False):

參數：

prob（float）： 數據增強概率
freq_mask_ratio（float）： 頻域掩蔽的比例
n_freq_masks（int）： 頻域掩蔽次數
time_mask_ratio（float）： 時間掩蔽的比例
n_time_masks（int）： 時間掩蔽次數
inplace（bool）： 用結果覆蓋
max_time_warp（bool）： 最大時間扭曲
replace_with_zero（bool）： 是否使用0作爲掩碼，否則使用平均值

def __call__(self, x) -> np.ndarray:

參數：

x：音頻特徵，維度(time, freq)

SpecSubAugmentor¶

從原始音頻中隨機替換部分幀，以模擬語音的時移。

論文：https://arxiv.org/abs/2106.05642

def __init__(self, prob=0.0, max_time=20, num_time_sub=3):

參數：

prob（float）： 數據增強概率
max_time（int）： 時間替換的最大寬度
num_time_sub（int）： 時間替換的的次數

def __call__(self, x) -> np.ndarray:

參數：

x：音頻特徵，維度(time, freq)

夜雨飄零音頻工具¶

安裝¶

快速使用¶

API文檔¶

AudioSegment¶

VadModel¶

VadOnlineModel¶

SpeedPerturbAugmentor¶

VolumePerturbAugmentor¶

ShiftPerturbAugmentor¶

ResampleAugmentor¶

NoisePerturbAugmentor¶

ReverbPerturbAugmentor¶

SpecAugmentor¶

SpecSubAugmentor¶

相關文章