Hash Table: How Does a Hash Table Store Data? A Diagram of Collision Resolution Methods

1. What is a Hash Table?

Imagine you’re looking for a book in a library. The librarian tells you the book’s “call number,” and you can directly locate the shelf (similar to an “array position”) without flipping through every book. A hash table works exactly like this: using a “key” (via a hash function) to quickly find the “location” (array index) of data, making operations like search, insertion, and deletion extremely efficient.

In simple terms, a hash table is a key-value pair storage structure. It uses a “hash function” to convert a “key” (such as a student ID or name) into an array index, then stores the “value” (such as a score or phone number) at the corresponding array position.

2. How Does a Hash Table Store Data?

1. Core Structure of a Hash Table

The underlying structure of a hash table is an array, where each position in the array is called a “bucket.” Each bucket can store one value. When collisions occur, multiple values may be stored in the same bucket (depending on the collision resolution method).

2. Storage Steps: Key → Position → Store Value

  1. Identify the key-value pair: For example, storing student information: ID=123, score=90.
  2. Compute the hash value: Use the “hash function” to convert the key into an array index. The simplest hash function is hash_value = key % array_length (e.g., if the array length is 10, 123 % 10 = 3, so the hash value is 3).
  3. Store the value in the array: Store the “value” at the array position corresponding to the hash value (i.e., the 3rd bucket).

Example: Storing 3 Students’ Scores in a Hash Table

Assume the array length is 10 (buckets numbered 0-9), and the hash function is key % 10:
- Student A: ID=12 → Hash value 12%10=2 → Store score 85 at array[2].
- Student B: ID=23 → Hash value 23%10=3 → Store score 76 at array[3].
- Student C: ID=34 → Hash value 34%10=4 → Store score 92 at array[4].

Current array state: [ , ,85,76,92, , , , , ] (empty positions separated by commas).
When querying, directly compute the hash value using the student ID (e.g., query ID=12 → hash value=2), then access array[2] to get the score 85. No need to traverse the entire array!

3. What is a Collision?

A collision occurs when two different keys produce the same hash value through the hash function. For example:
- IDs 12 and 22, with array length 10 and hash function %10: 12%10=2 and 22%10=2.
Here, both students would need to be stored at array[2], but the array position is unique, causing a collision!

4. Collision Resolution Methods (Diagram)

Collisions are a “hurdle” for hash tables, but two classic methods solve them: Chaining (Zipping Method) and Open Addressing (Linear Probing).

Method 1: Chaining (Zipping Method)

Principle: Each bucket in the array is the “head node” of a linked list. Colliding elements are directly appended to the tail of the corresponding linked list.
Example: Storing student ID=22 (score=70) after storing ID=12 (score=85):
- Array[2] becomes a linked list: 85 → 70 (Student A’s score is at the head, Student E’s score is at the tail).
- Other students are stored in their respective hash value buckets.

Chaining Diagram (Text Simulation):

Array Index → Bucket → Linked List (Colliding Elements)
0 → []
1 → []
2 → [85] → [70]  (IDs 12 and 22 collide, both stored in the linked list)
3 → [76]  (ID=23)
4 → [92]  (ID=34)
...

Advantages: Simple to implement, efficient for insertion/deletion, no clustering issues.
Disadvantages: Requires additional linked list space, beginner-friendly.

Method 2: Open Addressing (Linear Probing)

Principle: When a collision occurs, sequentially “probe” the next empty bucket. Linear probing follows: collision position h → h+1 → h+2 → … until an empty bucket is found.
Example: Array length 10, storing IDs 12 (85) and 22 (70):
- ID=12: Hash value=2 → array[2] is empty, store 85.
- ID=22: Hash value=2 → array[2] is occupied, probe h+1=3 (empty), store 70.

Linear Probing Diagram (Text Simulation Table):
| Array Position | 0 | 1 | 2 | 3 | 4 | … |
|----------------|—|—|—|—|—|-----|
| Status | | | 85 | 70 | 92 | … |

Advantages: No additional data structures needed, pure array operations.
Disadvantages: May form “clusters” (consecutive empty buckets filled), affecting efficiency.

5. Summary

Hash tables use a hash function to map keys to array positions, enabling O(1) level fast operations. Collisions are inevitable but can be resolved using:
- Chaining: Intuitive, beginner-friendly, with colliding elements linked in a list.
- Linear Probing: Uses array space directly to resolve collisions, simple but prone to clustering.

Mastering the core of hash tables requires understanding the logic of “hash function → collision → resolution methods,” which is foundational for learning more complex data structures!

Xiaoye