Introduction

This chapter introduces several sorting algorithms implemented in Python. These include Bubble Sort, Quick Sort, Insertion Sort, Shell Sort, Selection Sort, Heap Sort, Merge Sort, Counting Sort, Bucket Sort, and Radix Sort.

Create a relatively large list for testing the sorting algorithms:

import numpy as np
import time

src_list = np.random.randint(1, 10000000, (100000)).tolist()

Bubble Sort

Bubble Sort is a simple and intuitive sorting algorithm. It repeatedly traverses the list to be sorted, comparing two elements at a time and swapping them if their order is incorrect. The traversal continues until no more swaps are needed, indicating the list is sorted. The name comes from the fact that smaller elements “bubble” to the top of the list through repeated swaps.

def bubble_sort(arr):
    n = len(arr)
    for i in range(n-1):
        for j in range(0, n-1-i):
            if arr[j] > arr[j+1] :
                arr[j], arr[j+1] = arr[j+1], arr[j]
    return arr
start = time.time()
result = bubble_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))

Quick Sort

Quick Sort selects a pivot element from the dataset, places elements greater than the pivot to its right, and elements smaller than the pivot to its left. This divides the dataset into two parts, which are then recursively sorted until all elements are ordered.

def quick_sort(listt, left, right):
    if left >= right:
        return listt

    # Select the pivot as the first element in the current range
    pivot = listt[left]
    low = left  
    high = right

    while left < right:
        # Start from the right to find elements greater than the pivot
        while left < right and listt[right] >= pivot:
            right -= 1
        # Move this element to the left position
        listt[left] = listt[right] 

        # Start from the left to find elements smaller than the pivot
        while left < right and listt[left] <= pivot:
            left += 1
        # Move this element to the right position
        listt[right] = listt[left]

    # Write back the pivot value
    listt[left] = pivot

    # Recursively sort the left and right parts
    quick_sort(listt, low, left - 1)    # Recursively sort the left part
    quick_sort(listt, left + 1, high)   # Recursively sort the right part
    return listt
start = time.time()
result = quick_sort(src_list, 0, 1000)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))

Insertion Sort

Insertion Sort is a simple and intuitive sorting algorithm. It works by building an ordered sequence, where for each unsorted element, it scans the sorted sequence from the end to find the appropriate position for insertion.

def insert_sort(alist):
    length = len(alist)
    for i in range(1,length):
        for j in range(i, 0, -1):
            if alist[j] < alist[j - 1]:
                alist[j], alist[j - 1] = alist[j - 1], alist[i]
            else:
                break
    return alist
start = time.time()
result = insert_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))

Shell Sort

Shell Sort, also known as the decreasing increment sorting algorithm, is a more efficient improvement over Insertion Sort. However, Shell Sort is a non-stable sorting algorithm. The basic idea is: first, split the entire unsorted record sequence into several subsequences for direct insertion sorting. When the records in the entire sequence are “basically sorted”, perform direct insertion sorting on all records again.

def shell_sort(alist):
    n = len(alist)
    gap = n // 2
    while gap >= 1:
        for i in range(gap,n):
            while (i - gap) >= 0:
                if alist[i] < alist[i-gap]:
                    alist[i], alist[i-gap] = alist[i-gap], alist[i]
                    i = i - gap
                else:
                    break
        gap = gap // 2
    return alist
start = time.time()
result = shell_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))

Selection Sort

Selection Sort is a simple and intuitive sorting algorithm. Its working principle: first, find the smallest (largest) element in the unsorted sequence and place it at the beginning of the sorted sequence. Then, continue to find the smallest (largest) element from the remaining unsorted elements and place it at the end of the sorted sequence. Repeat this process until all elements are sorted.

def select_sort(alist):
    n = len(alist)
    for i in range(n):
    # Set index i as the index of the minimum value
        min_idx = i
        # By comparison, continuously adjust the position of the minimum value
        for j in range(i,n):
            if alist[j] < alist[min_idx]:
                min_idx = j
        # Swap the i-th value with the minimum value
        alist[i], alist[min_idx] = alist[min_idx], alist[i]
    return alist
start = time.time()
result = select_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))

Heap Sort

Heap Sort is a sorting algorithm designed using the heap data structure. A heap is a nearly complete binary tree structure that satisfies the heap property: the key value or index of child nodes is always less than (or greater than) that of its parent node. Heap Sort can be considered a selection sort that utilizes the concept of heaps.

# Adjust the heap structure so that the parent node value is greater than the child node values
def max_heap(heap, heapsize, root):
    left = 2*root+1
    right = left + 1
    large = root
    if left < heapsize and heap[large] < heap[left]:
        large = left
    if right < heapsize and heap[large] < heap[right]:
        large = right
    # If large is right or left, a child node larger than the parent is found; swap them to make the child the parent
    if large != root:
        heap[large], heap[root] = heap[root], heap[large]
        max_heap(heap, heapsize, large)

# Construct a heap and reorder the data in the heap
def build_max_heap(heap):
    length = len(heap)
    # Adjust the structure from the back forward
    for i in range((length-2)//2,-1,-1):
        max_heap(heap, length, i)

# Swap the root node with the last element and continue the adjustment process for the first len-1 nodes
def heap_sort(heap):
    build_max_heap(heap)
    for i in range(len(heap)-1,-1,-1):
        heap[0], heap[i] = heap[i], heap[0]
        max_heap(heap,i,0)
    return heap
start = time.time()
result = heap_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))

Merge Sort

Merge Sort is an efficient sorting algorithm based on the merge operation. This algorithm is a very typical application of the divide-and-conquer strategy.

Divide-and-Conquer:
- Divide: Recursively split the current sequence into two equal halves
- Conquer: Integrate the subsequences obtained in the previous step while maintaining element order (merge)

def merge_sort(alist):
    if len(alist) < 2:
        return alist
    # Split the list into smaller two lists
    mid = int(len(alist)/2)
    # Process the left and right lists separately, returning two sorted lists
    left = merge_sort(alist[:mid])
    right = merge_sort(alist[mid:])
    # Merge the two sorted lists to produce a new sorted list
    return merge(left,right)

def merge(left,right):
    i = 0
    j = 0
    result = []
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result += left[i:]
    result += right[j:]
    return result
start = time.time()
result = merge_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))

Counting Sort

The core of Counting Sort is to convert input data values into keys stored in an additionally allocated array space. As a linear time complexity sorting algorithm, Counting Sort requires the input data to be integers with a known range.

For each input element a[i], determine the number of elements less than a[i]. Thus, a[i] can be directly placed at its position in the output array. If there are 5 elements less than a[i], a[i] should be placed at the 6th position in the array.

def count_sort(alist):
    # Find the minimum and maximum values
    min_num = min(alist)
    max_num = max(alist)
    # Initialize the counting list
    count_list = [0]*(max_num-min_num+1)
    # Count each element in the list
    for num in alist:
        count_list[num-min_num] += 1
    alist.clear()
    # When the count of an element is not zero, fill the element back into the alist
    for cur_num, count in enumerate(count_list):
        while count != 0:
            alist.append(cur_num+min_num)
            count -= 1
    return alist
start = time.time()
result = count_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))

Bucket Sort

Divide the array a into n sub-intervals (buckets) of the same size, sort each sub-interval individually, and finally merge. Bucket Sort requires the data distribution to be uniform, otherwise it may fail. Counting Sort is a special case of Bucket Sort, where each bucket can be considered to contain only one element.

Principle:
(1) Set a fixed-size array as empty buckets
(2) Traverse the input data and place each data into the corresponding bucket
(3) Sort each non-empty bucket
(4) Concatenate the sorted data from non-empty buckets

def bucket_sort(alist):
    min_num = min(alist)
    max_num = max(alist)
    # Set the bucket size
    bucket_size = (max_num - min_num)/len(alist)
    # Create a bucket array
    bucket_list = [[]for i in range(len(alist)+1)]
    # Fill numbers into buckets
    for num in alist:
        bucket_list[int((num-min_num)//bucket_size)].append(num)
    alist.clear()
    #回填,这里桶内部排序直接调用了sorted
    for i in bucket_list:
        for j in sorted(i):
            alist.append(j)
    return alist
start = time.time()
result = bucket_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))

Radix Sort

The overall idea of Radix Sort is to split the data to be sorted into multiple keywords for sorting. In other words, the essence of Radix Sort is multi-keyword sorting. For example, when sorting scores, if two people have the same total score, the one with a higher Chinese score comes first; if the Chinese score is also the same, the one with a higher Math score comes first. For numbers, units, tens, hundreds digits are different priority keywords.

Principle:
(1) Find the maximum number in the array and determine its number of digits
(2) Create a bucket array
(3) Place numbers into different buckets according to their digit values
(4) Clear the original array and add data from each bucket to the original array in sequence
(5) Perform sorting for the previous digit, and loop until the sorting digit exceeds the maximum number of digits

def radix_sort(alist):
    # Record which digit we are sorting currently, starting with the least significant digit (units place)
    i = 0
    # Number of digits in the maximum number
    max_num = max(alist)
    j = len(str(max_num))
    while i < j:
        # Create a bucket array with 10 buckets (digits 0-9)
        bucket_list = [[]for i in range(10)]
        # Place numbers into different buckets based on the current digit
        for num in alist:
            bucket_list[int(num/(10**i)%10)].append(num)
        # Clear the original list and add data from each bucket
        alist.clear()
        for l in bucket_list:
            for b in l:
                alist.append(b)
        # Move to the next higher digit
        i += 1
    return alist
start = time.time()
result = radix_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))

For a better reading experience, visit Kesci Lab: https://www.kesci.com/home/project/5ebf61f52d8821002dc425d0

Xiaoye