Preface

This article introduces a chat service that can quickly set up a local large language model. Both the model and code are fully provided, and no internet connection is required for operation. The project uses the Qwen-7B-Int4 model, which can run smoothly on a graphics card with only 8GB of video memory. It supports both Windows and Linux systems.

Installation Environment

  1. Install the GPU version of PyTorch.
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
  1. Install other dependency libraries.
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Start the Service

  1. Execute the server.py program to start the large language model service.
python server.py

Android Application

Open the AndroidClient directory in the source code using Android Studio. This is an Android application source code. After opening it, you first need to modify the service address CHAT_HOST to the server IP address you used above, then click Run to install it on an Android phone.

Application effect:

![](/static/files/2023-10-23/1a8ff92618dd4c45baa1af64407278f6.gif)

Scan the QR code to join the knowledge planet and search for “Chat Application Based on Large Language Model” to obtain the source code

![](/static/files/2023-10-23/a7b76c37706e4bcfa203e7aa89b1354d.png)
Xiaoye