Hash tables are the “search experts” in data structures. They use a hash function to quickly map keys to positions in an array, achieving an average search efficiency of O(1). However, if two different keys are mapped to the same position by the hash function, a hash collision occurs. So, how do we resolve this conflict? Today, let’s explore several common hash collision resolution methods in the simplest way to help you easily understand the hash logic in data structures.

1. What is a Hash Collision?

First, recall the basic principle of hash tables: A hash table is essentially an array, where each position in the array is called a “slot”. We map a key (such as a student’s name or product ID) to an index in the array using a hash function (e.g., key % array length), and this index is the “home” of the key in the array.

But here’s the problem: If two different keys result in the same index (e.g., keys 1 and 6 with an array length of 5 and a hash function of key % 5, then 1%5=1 and 6%5=1—a collision!), they will “fight” for the same “home”. This is a hash collision.

2. Core Idea of Resolving Collisions

When a collision occurs, we need to find a “new home” for the conflicting elements. The common ideas are:
1. “Hang” conflicting elements in the same slot (e.g., multiple elements share a linked list);
2. “Squeeze” into other empty slots (e.g., search the array sequentially for the next empty position).

3. Two Most Common Methods

1. Chaining (Separate Chaining): Build a Linked List for Conflicting Elements

Principle: Each slot does not store just one element, but a linked list (or array). All conflicting elements are “hanged” on this linked list.
Analogy: It’s like a delivery locker. If two packages have the same hash value (e.g., both correspond to “Locker 0”), put them in a “temporary shelf” (linked list) next to the locker. When picking up, first find the locker, then search the shelf.

Example:
Suppose the hash table array length is 5 (slots 0-4) and the hash function is key % 5. Insert keys 1, 6, 11:
- Key 1: 1%5=1 → Slot 1 is empty, place it directly. Linked list[1] = [1];
- Key 6: 6%5=1 → Slot 1 has an element, append 6 to the end of the linked list. Linked list[1] = [1, 6];
- Key 11: 11%5=1 → Append to the linked list. Linked list[1] = [1, 6, 11].

When searching: For key 6, first compute 6%5=1 to locate slot 1, then traverse the linked list [1, 6, 11] to find 6.

Advantages: Simple to implement, suitable for dynamic data (add/remove anytime), and no space waste (no linked list needed if there are no collisions).
Disadvantages: Requires additional space for linked lists; worst-case O(n) time during searches (but average O(1)).

2. Open Addressing: “Squeeze” to Find an Empty Slot

Principle: If slot i is occupied, search for the next empty slot starting from i (forward or backward) until an empty slot is found. Depending on the “search rule”, it is divided into:
- Linear Probing: When colliding, probe i+1, i+2, ..., m-1, 0, 1, ... (circularly);
- Quadratic Probing: When colliding, probe i+1², i+2², i+3², ... (avoids consecutive clustering).

Analogy: Buying milk tea in line. The first person (key) stands at window 1 (slot 1). The second person (key) collides at window 1, so they go to window 2 (linear probing). If window 2 is also occupied, they go to window 4 (quadratic probing: 1+1²=2, 2+1²=3, 3+1²=4).

Example:
Array length 5, hash function key%5. Insert keys 1 (slot 1) and 6 (slot 1 collision):
- Key 1: 1%5=1 → Empty slot, place it;
- Key 6: 6%5=1 → Collision, linear probe to 1+1=2 (empty), place 6.

Insert key 11: 11%5=1 → Collision, linear probe to 2 (occupied by 6), then 3 (empty), place 11.

When searching: For key 6, compute 6%5=1 → slot 1 (not 6), linear probe to 2 (6), found.

Advantages: High space utilization (no extra space for linked lists); simple to implement (no linked list required).
Disadvantages: Linear probing causes “clustering” (multiple conflicting elements squeeze together), possibly filling the array; quadratic probing reduces clustering but requires a larger array to avoid empty slots.

4. Other Methods (Brief Understanding)

  • Rehashing: When a collision occurs, use another hash function to recalculate the position until an empty slot is found. E.g., f(key, i) = (f1(key) + i*f2(key)) % m (i=0,1,2…). Suitable for scenarios with few collisions but requires multiple hash functions.
  • Public Overflow Area: Divide the hash table into a “primary table” and an “overflow table”. The primary table stores non-conflicting elements, and all conflicting elements are placed in the overflow table. Search by checking the primary table first, then the overflow table. Suitable for static data with few collisions but complex overflow table management.

5. Summary: Which Method to Choose?

  • Chaining: Used in Java’s HashMap and Python’s dictionaries. Simple to implement, suitable for large dynamic data with many collisions (e.g., when data size is variable).
  • Open Addressing: Suitable for fixed-size arrays with few collisions (e.g., known data range). Linear probing is simple but causes clustering; quadratic probing is more flexible.
  • Other Methods: Rehashing is for scenarios with few collisions and strict anti-clustering requirements; public overflow area is for static data with extremely few collisions.

The essence of hash collision resolution is “how to find a new position for conflicting elements”. Understanding the two core methods (chaining and open addressing) will help you handle most scenarios. Next time you implement a hash table, you’ll know how to resolve collisions!

Xiaoye