Introduction

This article introduces how to deploy the DeepSeek-R1 service using the simplest commands. It is assumed that Anaconda is already installed, and we use the vllm framework, which can be easily deployed in China.

Deployment

  1. Create a virtual environment
conda create -n vllm python=3.11 -y
  1. Activate the virtual environment
conda activate vllm
  1. Install PyTorch framework
pip install torch torchvision torchaudio
  1. Install vllm and modelscope
pip install vllm
pip install modelscope
  1. Specify using modelscope to download the model
export VLLM_USE_MODELSCOPE=True
  1. Start the service. You can modify the model according to your needs. DeepSeek-R1 Address. The tensor-parallel-size parameter specifies the number of GPUs to use.
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --tensor-parallel-size 1 --max-model-len 32768 --enforce-eager

Invocation

To call the service using Python:

from openai import OpenAI

client = OpenAI(base_url="http://192.168.0.12:11434/v1",
                api_key="key")

messages = [{"role": "user", "content": "你好"}]
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
    messages=messages,
    stream=True
)

for chunk in response:
    delta = chunk.choices[0].delta
    delta_content = delta.content
    if delta_content is not None:
        print(delta_content, end='')
Xiaoye