How Does a Hash Table Store Data? Hash Functions Simplify Lookup

Have you ever encountered a situation where you wanted to find a book among hundreds on a shelf, but had to flip through each one, which was extremely slow? There’s a similar problem in data structures—if you store data in an array, finding a specific element might require searching from the beginning to the end, which is the “sequential search” method, and it’s very slow. Is there a way to make searching as fast as “finding a drawer by its number”? The answer is the hash table!

What is a Hash Table?¶

A hash table (also called a hash map) is a tool that allows faster storage and retrieval of data. Its core is using a “hash function” to “translate” the data to be stored into a fixed position (we call it a “bucket”). This way, when looking up data next time, you can directly calculate the position without flipping through all the data.

Core Tool: The Hash Function¶

A hash function is like a “translator”: the input is the data, and the output is a number (the hash value). This number corresponds to a “bucket” in the hash table (think of it as a slot in an array). For example, if you need to store a student’s ID, the hash function could be “take the last two digits of the ID” or “the remainder when the ID is divided by 100,” and then use this remainder as the bucket number.

For example:
Suppose we want to store student ID 12345. If the hash function is “take the last two digits,” the last two digits of 12345 are 45, so the hash value is 45, corresponding to bucket 45 in the hash table.

How Data is Stored in a Hash Table?¶

Storing data is like “putting a key into the corresponding drawer.” The steps are simple:

1. Prepare the Data¶

First, determine the data to be stored. For example, a data entry might include a “student ID (key)” and “name (value)”. We use the student ID as the “key” for lookup.

2. Calculate the Hash Value¶

Input the student ID into the hash function to get the hash value. For example, for ID 23456, using the “last two digits” hash function: 23456 % 100 = 45, so the hash value is 45.

3. Locate the Bucket Position¶

The hash value 45 is the bucket number (assuming the hash table is an array, the bucket number is the array index).

4. Handle Collisions¶

If multiple data entries calculate the same hash value (e.g., student IDs 12345 and 23456 both result in hash value 45), a “collision” occurs (multiple data want to use the same bucket). Common methods to resolve collisions are:
- Chaining: Each bucket holds a linked list. New data is simply “appended” to the end of the linked list (e.g., bucket 45 first contains 12345, then 23456 is added after it).
- Open Addressing: If bucket 45 is full, look for the next empty bucket (e.g., bucket 46) and store the data there.

Why is Hash Table Lookup Fast?¶

For example, if you want to find the student with ID 12345:
1. Use the hash function: 12345 % 100 = 45;
2. Directly locate bucket 45;
3. Search within the bucket for ID 12345 (if the bucket has a linked list, traverse the list; if empty, the data is found immediately).

The entire process does not require traversing all data. You only need to calculate the hash value once and then search within the bucket, making the lookup as fast as “opening a drawer by its number”!

Summary¶

A hash table uses a hash function to “translate” data into bucket positions, transforming lookup from “flipping through all data” to “direct positioning.” It’s like a library’s “book classification number”—you only need the number (hash value) to quickly find the book (data). From mobile phone contacts to website caches, hash tables are ubiquitous, with their core being the use of hash functions to simplify data lookup.