Articles tagged "Audio and Video"

Introduction and Usage of YeAudio Audio Tool

2024-08-29 445 views 语音 Audio and Video Speech Recognition Python FFmpeg

These classes define various audio data augmentation techniques. Each class is responsible for a specific data augmentation operation and can control the degree and type of augmentation by setting different parameters. The following is a detailed description of each class: ### 1. **SpecAugmentor** - **Function**: Frequency domain masking and time domain masking - **Main Parameters**: - `prob`: Probability of data augmentation. - `freq_mask_ratio`: Ratio of frequency domain masking (e.g., 0.15 means randomly selecting

HarmonyOS Application Development - Recording, Saving, and Playing Audio

2024-03-26 332 views 鸿蒙应用开发 HarmonyOS Audio and Video Huawei HarmonyOS

Your code example demonstrates how to implement audio recording and playback functions in HarmonyOS. Below is a summary of the code and some improvement suggestions: ### Summary 1. **Permission Application**: - User authorization is required before starting audio recording. - The `requestPermissionsFromUser` method is used to obtain the user's permission. 2. **Recording Function**: - Use `startRecord` to begin audio recording and save the file to the specified path.

Easily Identify Long Audio/Video Files with Hours-Long Duration

2024-01-07 232 views 语音 Pytorch Audio and Video Speech Recognition Pytorch Artificial Intelligence

This article introduces a method to build a long - speech recognition service capable of processing audio or video files that last tens of minutes or even several hours. First, the folder needs to be uploaded to the server, and then commands for compilation, permission modification, and starting the Docker container are executed to deploy the service. After testing that the service is available, the WebSocket interface or HTTP service can be used for interaction. The HTTP service provides a web interface, supporting the upload and recording recognition of audio and video in multiple formats, and returns text results containing the start and end timestamps of each sentence. This service simplifies the long - audio recognition process and improves user...