FastAPI Performance Optimization: A Guide to Efficiency Improvement from Code to Deployment

In today’s fast-paced web application development, API performance directly impacts user experience and system stability. As a high-performance Python web framework, FastAPI has become the preferred choice for many projects due to its asynchronous support and automatic documentation generation. However, even with FastAPI, performance bottlenecks may still occur if code and deployment are not optimized. This article will systematically break down the core optimization ideas and practical techniques for FastAPI performance, from code and database to caching and deployment, helping beginners quickly enhance application efficiency.

一、基础代码层面优化¶

FastAPI itself leverages asynchronous support and efficient parsers (e.g., Uvicorn) for inherent performance advantages, but code implementation remains a critical optimization area. Below are key directions for beginners:

1. 优先使用异步函数处理IO密集型任务¶

FastAPI supports async def for defining asynchronous path operations, ideal for IO-intensive tasks (e.g., database queries, API calls, file I/O). Critical note: Avoid CPU-intensive operations (e.g., complex loops, mathematical calculations) in async functions, as they block the event loop and degrade performance.

# 反例：同步数据库查询（阻塞事件循环）
def get_user_sync(user_id: int):
    db = create_db_connection()  # 同步连接
    result = db.query("SELECT * FROM users WHERE id = ?", user_id)
    db.close()
    return result

# 正例：异步数据库查询（非阻塞）
from fastapi import FastAPI
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine  # 异步ORM

app = FastAPI()
engine = create_async_engine("postgresql+asyncpg://user:pass@localhost/db")  # 异步连接

@app.get("/users/{user_id}")
async def get_user(user_id: int, db: AsyncSession = Depends(get_db)):
    result = await db.execute(User.__table__.select().where(User.id == user_id))  # 用await等待异步查询
    return result.scalars().first()

2. 避免“过度计算”和数据冗余¶

When returning large datasets (e.g., lists), constructing the full list directly can cause excessive memory usage. Use generators or pagination instead to reduce memory consumption.

# 反例：返回完整大列表（内存占用高）
@app.get("/products")
async def get_all_products():
    return [{"id": i, "name": f"商品{i}"} for i in range(10000)]  # 生成10000条数据

# 正例：使用生成器+分页（按需返回）
from fastapi import Query

@app.get("/products")
async def get_products(page: int = Query(1, ge=1), page_size: int = Query(20, ge=1, le=100)):
    offset = (page - 1) * page_size
    # 生成器示例：通过SQL的limit/offset分页，只返回当前页数据
    async with engine.connect() as conn:
        result = await conn.execute(
            User.__table__.select().offset(offset).limit(page_size)
        )
        return [dict(row) for row in result.all()]

3. 用参数验证减少后续处理¶

FastAPI’s parameter validation (e.g., type hints, Query/Path parameters) not only auto-generates API docs but also filters invalid requests early, avoiding unnecessary resource waste.

# 正例：参数类型注解+范围验证
@app.get("/items/{item_id}")
async def get_item(item_id: int = Path(..., ge=1), limit: int = Query(10, le=100)):  # 限制item_id≥1，limit≤100
    return {"item_id": item_id, "limit": limit}

二、异步编程的正确实践¶

Asynchronous programming is a core strength of FastAPI, but misuse can harm performance. Key principles:

1. 区分“IO密集型”和“CPU密集型”任务¶

IO-intensive (e.g., network requests, database queries): Asynchronous is ideal, as idle time allows task switching.
CPU-intensive (e.g., large data processing, AI inference): Use asyncio.run_in_executor to offload to a thread pool or switch to multiprocessing.

# 反例：异步函数中执行CPU密集任务（阻塞事件循环）
import time

async def process_data(data: list):
    result = []
    for item in data:
        time.sleep(0.1)  # 模拟CPU密集计算
        result.append(item * 2)
    return result

# 正例：将CPU密集任务丢到线程池（非阻塞）
from concurrent.futures import ThreadPoolExecutor

@app.get("/process")
async def process_data():
    data = [1, 2, 3, ..., 1000]  # 大数据列表
    loop = asyncio.get_event_loop()
    result = await loop.run_in_executor(
        ThreadPoolExecutor(),  # 线程池处理CPU密集任务
        process_data_sync,  # 同步函数（内部是CPU密集逻辑）
        data
    )
    return result

2. 异步框架的正确搭配¶

Use async ORMs (e.g., SQLAlchemy 1.4+ async, Tortoise-ORM) or async HTTP clients (e.g., aiohttp), avoiding synchronous libraries (e.g., requests) in async functions.

# 错误：异步函数中调用同步HTTP库（阻塞事件循环）
import requests

async def get_remote_data():
    response = requests.get("https://api.example.com/data")  # 同步请求，会阻塞！
    return response.json()

# 正确：用异步HTTP客户端
import aiohttp

async def get_remote_data():
    async with aiohttp.ClientSession() as session:
        async with session.get("https://api.example.com/data") as response:
            return await response.json()  # 等待异步响应

三、数据库查询优化¶

The database is often the primary bottleneck. Key optimization directions:

1. 连接池与连接复用¶

Frequent connection creation/closure wastes resources. Use connection pooling with reasonable sizing (typically CPU cores × 2).

# SQLAlchemy异步连接池配置
from sqlalchemy.ext.asyncio import AsyncEngine, AsyncSession
from sqlalchemy.orm import sessionmaker

engine = AsyncEngine(
    "postgresql+asyncpg://user:pass@localhost/db",
    pool_size=10,  # 连接池大小（根据并发量调整）
    max_overflow=20,  # 超出pool_size后最多创建的连接数
    pool_recycle=300  # 连接超时自动回收（避免数据库断连）
)
SessionLocal = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)

2. 索引与查询语句优化¶

Add indexes on query fields (e.g., WHERE/JOIN conditions) to avoid full table scans.
Optimize SQL: Avoid SELECT * (only fetch necessary fields), use LIMIT/OFFSET for pagination.

-- 优化前：无索引的全表扫描
SELECT * FROM users WHERE name LIKE '%test%';  # 慢！全表扫描

-- 优化后：加索引
CREATE INDEX idx_users_name ON users(name);  # 对name字段加索引
SELECT id, email FROM users WHERE name LIKE '%test%' LIMIT 100;  # 只查必要字段+分页

3. 延迟加载与批量操作¶

Lazy loading: Use selectinload (batch load related data) instead of joinedload (eager loading) to reduce database round-trips.
Bulk operations: Use bulk_insert_mappings instead of individual inserts to minimize SQL execution.

四、缓存策略：减少重复计算¶

Caching significantly reduces database load and redundant computations, ideal for frequently accessed, rarely changing data (e.g., popular product lists, configuration info).

1. 内存缓存（简单场景）¶

Use cachetools for in-memory caching (single-instance deployment):

from cachetools import LRUCache

# 定义缓存（最多存100条记录，自动淘汰旧数据）
cache = LRUCache(maxsize=100)

@app.get("/hot-products")
async def get_hot_products():
    # 尝试从缓存获取
    cached_result = cache.get("hot_products")
    if cached_result:
        return cached_result  # 直接返回缓存数据

    # 缓存未命中，查询数据库
    async with SessionLocal() as session:
        result = await session.execute("SELECT * FROM products ORDER BY sales DESC LIMIT 20")
        hot_products = [dict(row) for row in result.all()]

    # 存入缓存（设置10分钟过期）
    cache["hot_products"] = hot_products
    return hot_products

2. Redis分布式缓存（多实例场景）¶

For multi-server deployments, use Redis as a shared cache. Install redis first:

import redis
r = redis.Redis(host="localhost", port=6379, db=0)  # 连接Redis

@app.get("/product/{product_id}")
async def get_product(product_id: int):
    cache_key = f"product:{product_id}"
    cached_data = r.get(cache_key)
    if cached_data:
        return json.loads(cached_data)  # 返回缓存数据

    # 数据库查询
    product = await get_product_from_db(product_id)

    # 存入Redis（设置30分钟过期）
    r.setex(cache_key, 60*30, json.dumps(product))
    return product

五、部署与扩展：提升并发能力¶

After optimizing code and database, deployment configurations are critical for handling high concurrency:

1. 多进程/多线程部署¶

FastAPI requires asynchronous servers (e.g., Uvicorn) with process managers like Gunicorn:

# 启动命令：Uvicorn + Gunicorn（2个worker，每个worker4个线程）
gunicorn main:app -w 2 -k uvicorn.workers.UvicornWorker -t 120 --threads 4

Worker count: Typically CPU cores × 2 + 1 (maximize multi-core utilization).
Thread count: worker count × CPU cores for IO-bound tasks to minimize waiting time.

2. 反向代理与负载均衡¶

Use Nginx as a reverse proxy to handle static assets, SSL termination, and distribute traffic to multiple FastAPI instances:

# Nginx配置示例
server {
    listen 80;
    location / {
        proxy_pass http://127.0.0.1:8000;  # 转发到FastAPI
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
    location /static/ {
        alias /path/to/static/files/;  # 直接返回静态资源
    }
}

3. 容器化与自动扩缩容¶

Containerize with Docker and use Kubernetes for auto-scaling based on CPU/memory metrics:

# Dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["gunicorn", "main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]

总结：性能优化核心思路¶

Identify bottlenecks: Use tools like cProfile and asyncio debugging to pinpoint slow code segments.
Iterative optimization: Start with async code → database → caching → deployment.
Prioritize high-impact fixes: Indexing and caching often yield better results than complex architectures.
Monitor and iterate: Use Prometheus/APM tools to track metrics and continuously refine.

By following these steps, even beginners can systematically enhance FastAPI performance, building efficient and stable API services.