MongoDB is a popular document - oriented database that stores data in a JSON - like BSON format. It is widely used in various web applications and data analysis scenarios. As the data volume grows, query efficiency will gradually become a bottleneck. If the query speed is too slow, the user experience will decline and the system response will become sluggish. At this time, indexes become the core means of query optimization in MongoDB.
Why do queries become slow?¶
Suppose we have a collection of student information with 100,000 documents, and each document contains fields such as name, age, and score. If we want to query “students aged 20”, how will MongoDB do it?
- Without an index: MongoDB will start from the first document in the collection and check each document one by one to see if the condition (
age = 20) is met. This method is called a Full Collection Scan, and its time complexity is O(n) (where n is the total number of documents). When the data volume is large (such as in the millions), this method will be very time - consuming.
What is an index?¶
The index in MongoDB is essentially a special data structure, which is like a “table of contents” of a book, recording the mapping relationship between field values and document positions. For example, when we create an index for the age field, the index will be sorted by age and record the position of each document in the collection corresponding to that age.
Analogy in real - life scenarios:
- A book without a table of contents: To find a chapter related to “Python”, you have to flip through the pages one by one.
- A book with a table of contents: You can directly check the table of contents to find the page number and then flip to the corresponding page.
The index in MongoDB, through this “table of contents” mechanism, transforms the query from “full collection scan” to “fast positioning”, reducing the time complexity from O(n) to O(log n) (logarithmic level), which brings a significant improvement in efficiency.
How do indexes improve query efficiency?¶
Suppose we create an index for the age field in the students collection:
db.students.createIndex({age: 1}) // 1 means ascending order, -1 means descending order
At this time, when querying “students aged 20”:
- Without an index: Traverse 100,000 documents and check whether the age of each document is 20.
- With an index: Directly find the document positions corresponding to age = 20 in the index, and then jump to these positions to read the data.
In this process, MongoDB only needs to access the nodes of the index tree instead of the entire collection, so the speed is much faster.
How to create an index in MongoDB?¶
MongoDB provides the createIndex() method to create indexes. The syntax is:
db.collection.createIndex({field name: sorting method})
Sorting method:1indicates ascending order, and-1indicates descending order (ascending order is the default).
Examples:
1. Create a normal index for the name field:
db.students.createIndex({name: 1})
- Create a compound index for
ageandscore(sorted by age + score):
db.students.createIndex({age: 1, score: -1})
(The order of composite indexes is very important! For example, when querying age = 20 and score > 90, the age should come first to use the index efficiently.)
Common index types (must - know for beginners)¶
In addition to the most commonly used single - field index, MongoDB has several practical index types:
- Unique Index: Ensures that field values are unique and prevents duplicate data.
db.students.createIndex({email: 1}, {unique: true}) // The email cannot be repeated
- Compound Index: An index composed of multiple fields, suitable for multi - condition queries. For example:
db.orders.createIndex({user_id: 1, order_date: -1}) // First sorted by user ID in ascending order, then by order date in descending order
- Text Index: Used for text search and supports fuzzy matching.
db.books.createIndex({title: "text", author: "text"}) // Search for books whose title or author contains the keyword
How to verify whether an index is effective?¶
MongoDB provides the explain() method, which can be used to view the execution plan of the query and determine whether the index is used.
Example:
Query “students aged 20” and view the execution plan:
db.students.find({age: 20}).explain("executionStats")
After execution, focus on the following two fields:
- executionTimeMillis: Query time (unit: milliseconds), the smaller the value, the better.
- totalDocsExamined: The actual number of documents examined. If totalDocsExamined is equal to the number of query results (for example, 5 results are found and totalDocsExamined = 5), it means the index is used; if totalDocsExamined is equal to the total number of documents in the collection (for example, 100,000), it means the index is not used and the query is a full collection scan.
Pitfalls of indexes: More is not always better!¶
Although indexes can improve query efficiency, over - creating indexes will bring side effects:
- Occupying storage space: Each index requires additional storage. As the data volume increases, the space occupied by indexes also increases.
- Slowing down write operations: When inserting, updating, or deleting documents, MongoDB needs to maintain indexes at the same time. The more indexes there are, the slower the write operations will be.
Best practices:
- Prioritize creating indexes for frequently queried fields (such as age and name).
- Avoid creating indexes for rarely queried fields or fields with high repetition rates (such as gender = “male” accounting for 90%).
- Adjust the field order of compound indexes according to query frequency (for example, if user_id is more commonly used than order_date, put user_id first).
Summary¶
The index in MongoDB is a core tool for query optimization. It transforms the query from a full collection scan to fast positioning through the “table of contents” mechanism, greatly improving efficiency. Beginners need to master:
1. Understand the essence of the index: the mapping relationship between field values and document positions.
2. Master the basic syntax for creating indexes: createIndex({field: 1}).
3. Select the appropriate index type according to query requirements (single - field, compound, unique, etc.).
4. Use explain() to verify whether the index is effective and avoid invalid indexes.
By using indexes reasonably, you can achieve a qualitative leap in MongoDB query speed, ensuring that your application still maintains efficient response when the data volume grows.