Why Learn Query Optimization?¶
Imagine writing an SQL query, clicking “execute,” and waiting over ten seconds for the page to load—by then, the user has likely given up. The core goal of query optimization is to make SQL execute faster, reduce database load, and improve overall system response speed. For beginners, mastering basic techniques can significantly enhance query efficiency and avoid “overcomplicating simple problems.”
1. 3 Common Mistakes Beginners Make¶
Before optimization, check if you’ve fallen into these traps:
1. Full Table Scan: No index, forcing the database to traverse the entire table (e.g., SELECT * FROM users WHERE name='小明' on a large table with no name index).
2. SELECT : Fetching all columns, leading to unnecessary data transfer and memory usage.
3. Overusing JOINs or Complex Conditions*: Too many JOINs or wrapping indexed fields in functions (e.g., YEAR(create_time)=2023).
2. 5 Essential Simple Optimization Tips¶
Tip 1: Add Indexes to Query Conditions (Avoid Full Table Scans)¶
Problem: Without an index, the database “checks each row individually,” causing slow performance for large datasets.
Solution: Add indexes to frequently queried fields.
Example:
-- Unoptimized: Full table scan (100k rows, no index on `name`)
SELECT * FROM users WHERE name='小明';
-- Optimized: Add index on `name` (single creation, benefits future queries)
CREATE INDEX idx_name ON users(name);
-- After optimization: Database locates data via index (type changes from ALL to ref in EXPLAIN)
Notes:
- Don’t over-index: More indexes slow down inserts/updates/deletes. Use existing primary keys (e.g., id).
- Prioritize low-cardinality, high-query fields: user_id (unique) is better than username (may repeat).
Tip 2: Avoid SELECT *—Only Fetch Needed Columns¶
Problem: SELECT * returns all columns, including large fields (e.g., text), causing poor IO efficiency.
Solution: Explicitly list required columns.
Example:
-- Inefficient: Returns all columns, including large text fields
SELECT * FROM orders;
-- Optimized: Only select core fields (id, order_no, amount)
SELECT id, order_no, amount FROM orders;
Effect: Reduces network transfer and memory usage, especially with large datasets.
Tip 3: Optimize JOINs: Small Table Drives Large Table¶
Problem: Incorrect JOIN order causes large tables to be fully scanned.
Solution: Drive large tables with small tables (filter small tables first, then join with large tables). Prefer INNER JOIN (only matching data).
Example:
-- Inefficient: Large table `orders` scanned first, then joined with small table
SELECT o.id, oi.product_id FROM orders o
LEFT JOIN order_items oi ON o.id=oi.order_id
WHERE o.status='已完成';
-- Optimized: Filter small table first, then join with large table
SELECT o.id, oi.product_id FROM
(SELECT id FROM orders WHERE status='已完成') o -- Small table
LEFT JOIN order_items oi ON o.id=oi.order_id; -- Join with large table
Principle: The database uses the left table as the base; smaller tables reduce traversal.
Tip 4: Avoid Functions on Indexed Fields¶
Problem: Wrapping indexed fields in functions (e.g., YEAR(), DATE()) invalidates the index, triggering a full table scan.
Bad Example:
SELECT * FROM sales WHERE YEAR(sale_date)=2023; -- Index on `sale_date` is ignored
Optimized:
-- Use BETWEEN to preserve index usage
SELECT * FROM sales WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31';
Tip 5: Use EXPLAIN to Analyze Query Plans (Critical!)¶
Essential Tool: EXPLAIN reveals execution details. Check type and Extra columns:
- type: ALL (full scan, bad), ref (index used, good), eq_ref (primary key join, best).
- Extra: Using filesort (no index for sorting), Using temporary (temporary table created).
Example:
EXPLAIN SELECT * FROM users WHERE name='小明';
If type is ALL, optimize; if ref, the index works.
3. 3 Common Misconceptions (Avoid These Pitfalls)¶
-
“More indexes = faster queries”
Indexes speed up reads but slow down writes. For fields with low cardinality (e.g.,statuswith 3 values), indexes may harm performance. -
“OR conditions break indexes”
OR conditions may prevent multi-index usage.
Bad:
SELECT * FROM users WHERE id=1 OR name='张三'; -- May ignore `id` index
Better:
(SELECT * FROM users WHERE id=1) UNION ALL (SELECT * FROM users WHERE name='张三');
- “COUNT(*) is always fast”
COUNT(*)is fast (MySQL uses indexes), butCOUNT(DISTINCT name)is slow (requires deduplication). UseCOUNT(1)orCOUNT(*)if duplicates are unnecessary.
4. Summary: Optimize Step by Step¶
- Diagnose first: Use
EXPLAINto spot full scans or index failures. - Start with basics: Add indexes,
SELECTonly needed columns, and optimize JOIN order. - Avoid reinventing the wheel: Search for “EXPLAIN analysis” or “index failure” online for existing solutions.
Remember: Query optimization is not “voodoo”—it’s a science based on “reducing data processing + leveraging indexes.” Master these basics, and you’ll write “naturally fast” SQL!