Deploying Baidu Wenxin 4.5 Open-Source Large Model on AiStudio for Android Call
In the previous article "Deploying theERNIE 4.5 Open-Source Model for Android Device Calls", the blogger introduced how to deploy the ERNIE 4.5 open-source large language model on one's own server. However, for students without GPU servers, this approach is out of reach. Therefore, this article will introduce how to leverage the computing power on AiStudio for free to deploy the ERNIE 4.5 open-source large model for personal use.
Read MoreDeploying Baidu Wenxin 4.5 Open-Source Model for Android Device Calls
In the previous article "Usage and Deployment of the ERNIE 4.5 Open-Source Large Model", we introduced how to use FastDeploy to deploy the ERNIE 4.5 open-source large model and briefly called its interface. This article will describe how Android can call this deployed interface and implement conversations.
Read MoreUsage and Deployment of ERNIE 4.5 Open-Source Large Model
The ERNIE 4.5 series open-source models consist of a total of 10 models, covering Mixture-of-Experts (MoE) models with activation parameter scales of 47B and 3B (with the largest model having a total parameter count of 424B), as well as dense parameter models with 0.3B parameters. Below, we will introduce how to quickly use ERNIE 4.5 models for inference and deploy the interface for client-side calls on platforms such as Android and WeChat Mini Programs. Note that only text-type models are accepted here; in reality, ERNIE 4.5 also has multimodal models.
Read MoreDeploying Custom Gesture Recognition Models with MediaPipe on Android
This project implements a high-performance real-time gesture recognition Android application based on the Google MediaPipe and Android CameraX technology stacks. It adopts MediaPipe's latest Gesture Recognition API, supporting the recognition of various gesture types, including common gestures such as thumb-up, victory sign, and open palm. Additionally, it features real-time hand key point detection and drawing functionality.
Read MoreCustom Gesture Recognition Training Model with MediaPipe
MediaPipe is an open-source framework developed by Google for building perception pipelines to process time-series data such as video and audio. Among its components, MediaPipe Hands is a high-performance hand key-point detection solution capable of real-time hand key-point detection on mobile devices.
Read MoreA Tool Website Developed with Python
This article introduces a feature-rich tool website developed using Python. It includes various tools such as document tools, PDF tools, image tools, audio tools, video tools, voice tools, and programming tools, which are commonly used in work or study.
Read MoreQuickly Deploy a DeepSeek-R1 Service from Scratch
Here are the simplest commands to introduce how to deploy the DeepSeek-R1 service. Anaconda is assumed to be already installed, and the vllm framework is used, making it easy to deploy even in China.
Read MoreRapid Training of Cat and Dog Sound Classification Model
This paper introduces how to quickly perform sound classification training and inference using PyTorch and the macls library. First, create a Python 3.11 virtual environment via Anaconda and install the PyTorch 2.5.1 GPU version along with the macls library. Next, prepare the dataset, with provided download links or support for custom formats. The training part can be completed with just three lines of code for model training, optimization, and saving. The inference phase loads the pre-trained model for prediction. The framework supports multiple sound classification models, facilitating different scenario requirements.
Read MoreQuick Deployment of Speech Recognition Framework Using MASR V3
This framework appears to be very comprehensive and user-friendly, covering multiple stages from data preparation to model training and inference. To help readers better understand and utilize this framework, I will provide detailed explanations for each part along with some sample code. ### 1. Environment Setup First, you need to install the necessary dependency packages. Assuming you have already created and activated a virtual environment: ```sh pip install paddlepaddle==2.4.0 -i https://mirror.baidu.com/pypi/ ```
Read MoreQuick Deployment of Speech Recognition Framework Using PPASR V3
This detailed introduction demonstrates the process of developing and deploying speech recognition tasks using the PaddleSpeech framework. Below are some supplements and suggestions to the information you provided: 1. **Installation Environment**: Ensure your environment has installed the necessary dependencies, including libraries such as PaddlePaddle and PaddleSpeech. These libraries can be installed via the pip command. 2. **Data Preprocessing**: - You may need to perform preprocessing steps on the raw audio, such as sample rate adjustment and noise removal.
Read MoreText Endpoint Detection Based on Large Language Models
This paper introduces a method to detect text endpoints using large language models (LLMs) to improve Voice Activity Detection (VAD) in voice conversations. By training a fine-tuned model to predict whether a sentence is complete, the user's intent can be more accurately judged. The specific steps include: 1. **Principle and Data Preparation**: Leverage the text generation capabilities of large language models to fine-tune based on predefined datasets and specific formats. 2. **Fine-tuning the Model**: Use the LLaMA-Factory tool for training, selecting appropriate prompt templates and optimized data formats. 3.
Read MoreSpeaker Log Implementation Based on PyTorch (Speaker Separation)
This article introduces the speaker diarization feature of the VoiceprintRecognition_Pytorch framework implemented based on PyTorch, which supports various advanced models and data preprocessing methods. By executing the `infer_speaker_diarization.py` script or using the GUI interface program, audio can be speaker-separated and results displayed. The output includes the start and end times of each speaker and their identity information (registration is required first). Additionally, the article provides solutions for Chinese names in the Ubuntu system... (注:原文末尾“解决中文名”表述不完整,已保留原文未尽部分的省略格式,完整内容需参考原文后续章节)
Read MoreIntroduction and Usage of YeAudio Audio Tool
These classes define various audio data augmentation techniques. Each class is responsible for a specific data augmentation operation and can control the degree and type of augmentation by setting different parameters. The following is a detailed description of each class: ### 1. **SpecAugmentor** - **Function**: Frequency domain masking and time domain masking - **Main Parameters**: - `prob`: Probability of data augmentation. - `freq_mask_ratio`: Ratio of frequency domain masking (e.g., 0.15 means randomly selecting
Read MoreInstalling Docker on Ubuntu with GPU Support
This article introduces the installation and configuration of Docker using the Alibaba Cloud mirror source, with support for NVIDIA GPU usage. First, add the Alibaba Cloud GPG key and set up the repository, then update the apt source and install Docker. Next, add the domestic mirror source address in `/etc/docker/daemon.json` and restart the Docker service for configuration. Then, download and install nvidia-container-toolkit via the curl command, configure it as the Docker runtime, and finally test GPU support. Key steps
Read MoreStarting Programs with /etc/rc.local on Ubuntu 22.04
This article introduces the method to achieve program startup at boot using `/etc/rc.local` on Ubuntu 20.04 or 22.04 systems. It requires editing the `/lib/systemd/system/rc-local.service` file to add configurations, creating and granting execution permissions to `/etc/rc.local`, creating a soft link for the service, and enabling the relevant service. After the above steps, reboot the device to check if the startup at boot is successfully implemented. If a log file containing "Test Successful" is generated in the specified path, it indicates that the setup...
Read MoreNight Rain Drifting · A Thousand Questions: Answering Your Endless Queries
Night Rain Drifting · Qianwen Launcher is an efficient and convenient LLM (Large Language Model) launching tool. It supports the Windows system and requires an NVIDIA graphics card with a driver version above 516.01. The launcher comes pre - installed with multiple model specifications, suitable for different scenario requirements, with the minimum requirement of only 1G of video memory. The interface is divided into three parts: the Launch Page, the Chat Page, and the Log Page. The Launch Page is used to select and load model files (automatically downloads if not available locally). After clicking "Load", it seamlessly switches to the Chat Page for interaction; the Chat Page supports asking questions at any time, and the model provides an intelligent dialogue experience with instant responses; the Log Page records the usage
Read MoreHarmonyOS Application Development - Recording, Saving, and Playing Audio
Your code example demonstrates how to implement audio recording and playback functions in HarmonyOS. Below is a summary of the code and some improvement suggestions: ### Summary 1. **Permission Application**: - User authorization is required before starting audio recording. - The `requestPermissionsFromUser` method is used to obtain the user's permission. 2. **Recording Function**: - Use `startRecord` to begin audio recording and save the file to the specified path.
Read MoreHarmonyOS Application Development - Recording Audio and Implementing Real - time Speech Recognition with WebSocket
Your code implements a complete example of real-time speech recognition using WebSocket. The following are some supplementary and optimization suggestions for the entire project to ensure robustness and maintainability. ### 1. Permission Check and Prompt When requesting permissions, more detailed prompt information can be provided, and reasonable operational suggestions can be given after the user refuses authorization, or guide the user to go to the settings page for manual authorization. ```javascript reqPermissionsAndRecord(permissions: Ar ```
Read MoreHarmonyOS App Development - Customizable Deletable List Popup
This application implements a custom list popup window function, supporting task addition, deletion, and confirmation. The specific implementation is as follows: 1. **Entity Class**: The `Intention` class is used to define task items. 2. **Data Source Class** (`IntentionDataSource`): Manages data operations for the task list, including CRUD operations and notifying listeners of updates. 3. **Custom Popup Component** (`AddIntentionDialog`): Displays the current task list and provides delete and confirm buttons. (Note: The original text cuts off here, the translation assumes standard functionality continuation)
Read MoreHarmonyOS Application Development - Imitating WeChat Chat Message List
This example demonstrates how to create a chat application interface similar to WeChat using ArkTS. The page structure includes a scrollable message list and a button to dynamically add new messages. The core code is as follows: 1. The `Msg` class defines the message type (sent or received). 2. The `MsgDataSource` class implements the data source interface, manages the message list, and provides add/delete operations. 3. The page uses the `List` component to display the message list, with `LazyForEach` to dynamically load new messages as the user scrolls.
Read MoreHarmonyOS Application Development - Sending POST Request and Obtaining Result
This code is used to send data to the server via a POST request and parse the JSON response. The core functionalities include: 1. Using the `http.createHttp().request()` method to send asynchronous POST requests. 2. Setting request headers and the data to be sent. 3. Obtaining the response result and parsing it into JSON format. 4. Parsing the JSON data and extracting valid information to update the interface text. The code structure clearly demonstrates how to implement HTTP requests in a HarmonyOS application by setting state variables.
Read MoreHarmonyOS Application Development - Playing Local Audio Files
This document introduces the implementation of audio playback functionality on HarmonyOS using the AVPlayer audio and video player. The main steps include: 1. Creating an `AVPlayer` instance and registering callback functions to handle state changes and errors; 2. Obtaining the local audio file path, opening the audio file through file system operations to get the file descriptor, and setting it to `AVPlayer` to trigger resource initialization; 3. Implementing state machine transition logic, from resource initialization to playback completion. This code snippet demonstrates how to implement audio playback using the ArkTS language under the Stage model.
Read MoreHarmonyOS Application Development - Requesting Voice Synthesis Service to Obtain Audio File
This document describes a text-to-speech service implemented using HarmonyOS, which uploads text data and requests the server to return audio data. Key steps include creating HTTP requests, setting request headers and data bodies, processing response data, and saving it to a local file. The code example demonstrates how to integrate this functionality in an Ability, specifically implementing the download and saving of a .wav format voice file after the user inputs text. It should be noted that the service response type must be `application/octet-stream` to correctly obtain the audio stream, and this service is only applicable to... (The original text appears to be cut off here.)
Read MoreEasily Identify Long Audio/Video Files with Hours-Long Duration
This article introduces a method to build a long - speech recognition service capable of processing audio or video files that last tens of minutes or even several hours. First, the folder needs to be uploaded to the server, and then commands for compilation, permission modification, and starting the Docker container are executed to deploy the service. After testing that the service is available, the WebSocket interface or HTTP service can be used for interaction. The HTTP service provides a web interface, supporting the upload and recording recognition of audio and video in multiple formats, and returns text results containing the start and end timestamps of each sentence. This service simplifies the long - audio recognition process and improves user...
Read MoreReal-time Command Wake-up
This paper introduces the development and usage of a real-time instruction wake-up program, including steps such as installation environment, instruction wake-up, and model fine-tuning. The project runs on Anaconda 3 and Python 3.11, with dependencies on PyTorch 2.1.0 and CUDA 12.1. Users can customize the recording time and length by adjusting parameters `sec_time` and `last_len`, and add instructions in `instruct.txt` for personalized settings. The program can be executed via `infer_pytorch.py` or `infer_on
Read More