Yeyupiaoling

Deploying Baidu Wenxin 4.5 Open-Source Large Model on AiStudio for Android Call

2025-07-05 175 views Android PaddlePaddle Android PaddlePaddle Large Language Model

In the previous article "Deploying theERNIE 4.5 Open-Source Model for Android Device Calls", the blogger introduced how to deploy the ERNIE 4.5 open-source large language model on one's own server. However, for students without GPU servers, this approach is out of reach. Therefore, this article will introduce how to leverage the computing power on AiStudio for free to deploy the ERNIE 4.5 open-source large model for personal use.

Deploying Baidu Wenxin 4.5 Open-Source Model for Android Device Calls

2025-07-05 141 views PaddlePaddle 大语言模型 Artificial Intelligence PaddlePaddle Large Model

In the previous article "Usage and Deployment of the ERNIE 4.5 Open-Source Large Model", we introduced how to use FastDeploy to deploy the ERNIE 4.5 open-source large model and briefly called its interface. This article will describe how Android can call this deployed interface and implement conversations.

Usage and Deployment of ERNIE 4.5 Open-Source Large Model

2025-07-05 143 views PaddlePaddle 大模型 Artificial Intelligence PaddlePaddle Large Model

The ERNIE 4.5 series open-source models consist of a total of 10 models, covering Mixture-of-Experts (MoE) models with activation parameter scales of 47B and 3B (with the largest model having a total parameter count of 424B), as well as dense parameter models with 0.3B parameters. Below, we will introduce how to quickly use ERNIE 4.5 models for inference and deploy the interface for client-side calls on platforms such as Android and WeChat Mini Programs. Note that only text-type models are accepted here; in reality, ERNIE 4.5 also has multimodal models.

Deploying Custom Gesture Recognition Models with MediaPipe on Android

2025-07-05 190 views TensorFlow Android mediapipe

This project implements a high-performance real-time gesture recognition Android application based on the Google MediaPipe and Android CameraX technology stacks. It adopts MediaPipe's latest Gesture Recognition API, supporting the recognition of various gesture types, including common gestures such as thumb-up, victory sign, and open palm. Additionally, it features real-time hand key point detection and drawing functionality.

Custom Gesture Recognition Training Model with MediaPipe

2025-07-05 297 views TensorFlow Android mediapipe

MediaPipe is an open-source framework developed by Google for building perception pipelines to process time-series data such as video and audio. Among its components, MediaPipe Hands is a high-performance hand key-point detection solution capable of real-time hand key-point detection on mobile devices.

A Tool Website Developed with Python

2025-07-05 180 views 后端 Python

This article introduces a feature-rich tool website developed using Python. It includes various tools such as document tools, PDF tools, image tools, audio tools, video tools, voice tools, and programming tools, which are commonly used in work or study.

Quickly Deploy a DeepSeek-R1 Service from Scratch

2025-04-05 410 views 大语言模型 Artificial Intelligence Large Language Model DeepSeek vllm

Here are the simplest commands to introduce how to deploy the DeepSeek-R1 service. Anaconda is assumed to be already installed, and the vllm framework is used, making it easy to deploy even in China.

Rapid Training of Cat and Dog Sound Classification Model

2025-03-08 592 views Pytorch 深度学习 Artificial Intelligence Sound Classification category Data Mining

This paper introduces how to quickly perform sound classification training and inference using PyTorch and the macls library. First, create a Python 3.11 virtual environment via Anaconda and install the PyTorch 2.5.1 GPU version along with the macls library. Next, prepare the dataset, with provided download links or support for custom formats. The training part can be completed with just three lines of code for model training, optimization, and saving. The inference phase loads the pre-trained model for prediction. The framework supports multiple sound classification models, facilitating different scenario requirements.

Quick Deployment of Speech Recognition Framework Using MASR V3

2025-03-08 464 views 语音 Pytorch 深度学习 Artificial Intelligence Speech Recognition Pytorch

This framework appears to be very comprehensive and user-friendly, covering multiple stages from data preparation to model training and inference. To help readers better understand and utilize this framework, I will provide detailed explanations for each part along with some sample code. ### 1. Environment Setup First, you need to install the necessary dependency packages. Assuming you have already created and activated a virtual environment: ```sh pip install paddlepaddle==2.4.0 -i https://mirror.baidu.com/pypi/ ```

Quick Deployment of Speech Recognition Framework Using PPASR V3

2025-03-08 455 views 语音 PaddlePaddle 深度学习 Artificial Intelligence PaddlePaddle Speech Recognition

This detailed introduction demonstrates the process of developing and deploying speech recognition tasks using the PaddleSpeech framework. Below are some supplements and suggestions to the information you provided: 1. **Installation Environment**: Ensure your environment has installed the necessary dependencies, including libraries such as PaddlePaddle and PaddleSpeech. These libraries can be installed via the pip command. 2. **Data Preprocessing**: - You may need to perform preprocessing steps on the raw audio, such as sample rate adjustment and noise removal.

Text Endpoint Detection Based on Large Language Models

2025-01-18 424 views 大语言模型 Artificial Intelligence language model Natural Language Processing

This paper introduces a method to detect text endpoints using large language models (LLMs) to improve Voice Activity Detection (VAD) in voice conversations. By training a fine-tuned model to predict whether a sentence is complete, the user's intent can be more accurately judged. The specific steps include: 1. **Principle and Data Preparation**: Leverage the text generation capabilities of large language models to fine-tune based on predefined datasets and specific formats. 2. **Fine-tuning the Model**: Use the LLaMA-Factory tool for training, selecting appropriate prompt templates and optimized data formats. 3.

Speaker Log Implementation Based on PyTorch (Speaker Separation)

2024-12-22 466 views 语音 Pytorch Pytorch Artificial Intelligence Python Voiceprint Recognition Speaker Log Speaker Diarization

This article introduces the speaker diarization feature of the VoiceprintRecognition_Pytorch framework implemented based on PyTorch, which supports various advanced models and data preprocessing methods. By executing the `infer_speaker_diarization.py` script or using the GUI interface program, audio can be speaker-separated and results displayed. The output includes the start and end times of each speaker and their identity information (registration is required first). Additionally, the article provides solutions for Chinese names in the Ubuntu system... （注：原文末尾“解决中文名”表述不完整，已保留原文未尽部分的省略格式，完整内容需参考原文后续章节）

Introduction and Usage of YeAudio Audio Tool

2024-08-29 454 views 语音 Audio and Video Speech Recognition Python FFmpeg

These classes define various audio data augmentation techniques. Each class is responsible for a specific data augmentation operation and can control the degree and type of augmentation by setting different parameters. The following is a detailed description of each class: ### 1. **SpecAugmentor** - **Function**: Frequency domain masking and time domain masking - **Main Parameters**: - `prob`: Probability of data augmentation. - `freq_mask_ratio`: Ratio of frequency domain masking (e.g., 0.15 means randomly selecting

Installing Docker on Ubuntu with GPU Support

2024-08-29 508 views 后端 Ubuntu Docker eureka

This article introduces the installation and configuration of Docker using the Alibaba Cloud mirror source, with support for NVIDIA GPU usage. First, add the Alibaba Cloud GPG key and set up the repository, then update the apt source and install Docker. Next, add the domestic mirror source address in `/etc/docker/daemon.json` and restart the Docker service for configuration. Then, download and install nvidia-container-toolkit via the curl command, configure it as the Docker runtime, and finally test GPU support. Key steps

Starting Programs with /etc/rc.local on Ubuntu 22.04

2024-07-02 476 views 后端 Ubuntu

This article introduces the method to achieve program startup at boot using `/etc/rc.local` on Ubuntu 20.04 or 22.04 systems. It requires editing the `/lib/systemd/system/rc-local.service` file to add configurations, creating and granting execution permissions to `/etc/rc.local`, creating a soft link for the service, and enabling the relevant service. After the above steps, reboot the device to check if the startup at boot is successfully implemented. If a log file containing "Test Successful" is generated in the specified path, it indicates that the setup...

Night Rain Drifting · A Thousand Questions: Answering Your Endless Queries

2024-04-09 348 views 启动器 Large Language Model Tongyi Qianwen One - key start Starter

Night Rain Drifting · Qianwen Launcher is an efficient and convenient LLM (Large Language Model) launching tool. It supports the Windows system and requires an NVIDIA graphics card with a driver version above 516.01. The launcher comes pre - installed with multiple model specifications, suitable for different scenario requirements, with the minimum requirement of only 1G of video memory. The interface is divided into three parts: the Launch Page, the Chat Page, and the Log Page. The Launch Page is used to select and load model files (automatically downloads if not available locally). After clicking "Load", it seamlessly switches to the Chat Page for interaction; the Chat Page supports asking questions at any time, and the model provides an intelligent dialogue experience with instant responses; the Log Page records the usage

HarmonyOS Application Development - Recording, Saving, and Playing Audio

2024-03-26 340 views 鸿蒙应用开发 HarmonyOS Audio and Video Huawei HarmonyOS

Your code example demonstrates how to implement audio recording and playback functions in HarmonyOS. Below is a summary of the code and some improvement suggestions: ### Summary 1. **Permission Application**: - User authorization is required before starting audio recording. - The `requestPermissionsFromUser` method is used to obtain the user's permission. 2. **Recording Function**: - Use `startRecord` to begin audio recording and save the file to the specified path.

HarmonyOS Application Development - Recording Audio and Implementing Real - time Speech Recognition with WebSocket

2024-03-26 272 views 鸿蒙应用开发 HarmonyOS websocket Speech Recognition HarmonyOS Huawei

Your code implements a complete example of real-time speech recognition using WebSocket. The following are some supplementary and optimization suggestions for the entire project to ensure robustness and maintainability. ### 1. Permission Check and Prompt When requesting permissions, more detailed prompt information can be provided, and reasonable operational suggestions can be given after the user refuses authorization, or guide the user to go to the settings page for manual authorization. ```javascript reqPermissionsAndRecord(permissions: Ar ```

HarmonyOS App Development - Customizable Deletable List Popup

2024-02-03 250 views 鸿蒙应用开发 HarmonyOS Huawei HarmonyOS

This application implements a custom list popup window function, supporting task addition, deletion, and confirmation. The specific implementation is as follows: 1. **Entity Class**: The `Intention` class is used to define task items. 2. **Data Source Class** (`IntentionDataSource`): Manages data operations for the task list, including CRUD operations and notifying listeners of updates. 3. **Custom Popup Component** (`AddIntentionDialog`): Displays the current task list and provides delete and confirm buttons. (Note: The original text cuts off here, the translation assumes standard functionality continuation)

HarmonyOS Application Development - Imitating WeChat Chat Message List

2024-01-19 222 views 鸿蒙应用开发 HarmonyOS WeChat Huawei HarmonyOS

This example demonstrates how to create a chat application interface similar to WeChat using ArkTS. The page structure includes a scrollable message list and a button to dynamically add new messages. The core code is as follows: 1. The `Msg` class defines the message type (sent or received). 2. The `MsgDataSource` class implements the data source interface, manages the message list, and provides add/delete operations. 3. The page uses the `List` component to display the message list, with `LazyForEach` to dynamically load new messages as the user scrolls.

HarmonyOS Application Development - Sending POST Request and Obtaining Result

2024-01-19 251 views 鸿蒙应用开发 HarmonyOS Huawei HarmonyOS

This code is used to send data to the server via a POST request and parse the JSON response. The core functionalities include: 1. Using the `http.createHttp().request()` method to send asynchronous POST requests. 2. Setting request headers and the data to be sent. 3. Obtaining the response result and parsing it into JSON format. 4. Parsing the JSON data and extracting valid information to update the interface text. The code structure clearly demonstrates how to implement HTTP requests in a HarmonyOS application by setting state variables.

HarmonyOS Application Development - Playing Local Audio Files

2024-01-18 254 views 鸿蒙应用开发 HarmonyOS Huawei HarmonyOS

This document introduces the implementation of audio playback functionality on HarmonyOS using the AVPlayer audio and video player. The main steps include: 1. Creating an `AVPlayer` instance and registering callback functions to handle state changes and errors; 2. Obtaining the local audio file path, opening the audio file through file system operations to get the file descriptor, and setting it to `AVPlayer` to trigger resource initialization; 3. Implementing state machine transition logic, from resource initialization to playback completion. This code snippet demonstrates how to implement audio playback using the ArkTS language under the Stage model.

HarmonyOS Application Development - Requesting Voice Synthesis Service to Obtain Audio File

2024-01-18 250 views 鸿蒙应用开发 HarmonyOS Huawei HarmonyOS

This document describes a text-to-speech service implemented using HarmonyOS, which uploads text data and requests the server to return audio data. Key steps include creating HTTP requests, setting request headers and data bodies, processing response data, and saving it to a local file. The code example demonstrates how to integrate this functionality in an Ability, specifically implementing the download and saving of a .wav format voice file after the user inputs text. It should be noted that the service response type must be `application/octet-stream` to correctly obtain the audio stream, and this service is only applicable to... (The original text appears to be cut off here.)

Easily Identify Long Audio/Video Files with Hours-Long Duration

2024-01-07 241 views 语音 Pytorch Audio and Video Speech Recognition Pytorch Artificial Intelligence

This article introduces a method to build a long - speech recognition service capable of processing audio or video files that last tens of minutes or even several hours. First, the folder needs to be uploaded to the server, and then commands for compilation, permission modification, and starting the Docker container are executed to deploy the service. After testing that the service is available, the WebSocket interface or HTTP service can be used for interaction. The HTTP service provides a web interface, supporting the upload and recording recognition of audio and video in multiple formats, and returns text results containing the start and end timestamps of each sentence. This service simplifies the long - audio recognition process and improves user...

Real-time Command Wake-up

2023-12-17 201 views 语音 Pytorch Artificial Intelligence FunASR Pytorch Speech Recognition Voice Wake-up

This paper introduces the development and usage of a real-time instruction wake-up program, including steps such as installation environment, instruction wake-up, and model fine-tuning. The project runs on Anaconda 3 and Python 3.11, with dependencies on PyTorch 2.1.0 and CUDA 12.1. Users can customize the recording time and length by adjusting parameters `sec_time` and `last_len`, and add instructions in `instruct.txt` for personalized settings. The program can be executed via `infer_pytorch.py` or `infer_on