Introduction¶
This chapter introduces several sorting algorithms implemented in Python. These include Bubble Sort, Quick Sort, Insertion Sort, Shell Sort, Selection Sort, Heap Sort, Merge Sort, Counting Sort, Bucket Sort, and Radix Sort.
Create a relatively large list for testing the sorting algorithms:
import numpy as np
import time
src_list = np.random.randint(1, 10000000, (100000)).tolist()
Bubble Sort¶
Bubble Sort is a simple and intuitive sorting algorithm. It repeatedly traverses the list to be sorted, comparing two elements at a time and swapping them if their order is incorrect. The traversal continues until no more swaps are needed, indicating the list is sorted. The name comes from the fact that smaller elements “bubble” to the top of the list through repeated swaps.
def bubble_sort(arr):
n = len(arr)
for i in range(n-1):
for j in range(0, n-1-i):
if arr[j] > arr[j+1] :
arr[j], arr[j+1] = arr[j+1], arr[j]
return arr
start = time.time()
result = bubble_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))
Quick Sort¶
Quick Sort selects a pivot element from the dataset, places elements greater than the pivot to its right, and elements smaller than the pivot to its left. This divides the dataset into two parts, which are then recursively sorted until all elements are ordered.
def quick_sort(listt, left, right):
if left >= right:
return listt
# Select the pivot as the first element in the current range
pivot = listt[left]
low = left
high = right
while left < right:
# Start from the right to find elements greater than the pivot
while left < right and listt[right] >= pivot:
right -= 1
# Move this element to the left position
listt[left] = listt[right]
# Start from the left to find elements smaller than the pivot
while left < right and listt[left] <= pivot:
left += 1
# Move this element to the right position
listt[right] = listt[left]
# Write back the pivot value
listt[left] = pivot
# Recursively sort the left and right parts
quick_sort(listt, low, left - 1) # Recursively sort the left part
quick_sort(listt, left + 1, high) # Recursively sort the right part
return listt
start = time.time()
result = quick_sort(src_list, 0, 1000)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))
Insertion Sort¶
Insertion Sort is a simple and intuitive sorting algorithm. It works by building an ordered sequence, where for each unsorted element, it scans the sorted sequence from the end to find the appropriate position for insertion.
def insert_sort(alist):
length = len(alist)
for i in range(1,length):
for j in range(i, 0, -1):
if alist[j] < alist[j - 1]:
alist[j], alist[j - 1] = alist[j - 1], alist[i]
else:
break
return alist
start = time.time()
result = insert_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))
Shell Sort¶
Shell Sort, also known as the decreasing increment sorting algorithm, is a more efficient improvement over Insertion Sort. However, Shell Sort is a non-stable sorting algorithm. The basic idea is: first, split the entire unsorted record sequence into several subsequences for direct insertion sorting. When the records in the entire sequence are “basically sorted”, perform direct insertion sorting on all records again.
def shell_sort(alist):
n = len(alist)
gap = n // 2
while gap >= 1:
for i in range(gap,n):
while (i - gap) >= 0:
if alist[i] < alist[i-gap]:
alist[i], alist[i-gap] = alist[i-gap], alist[i]
i = i - gap
else:
break
gap = gap // 2
return alist
start = time.time()
result = shell_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))
Selection Sort¶
Selection Sort is a simple and intuitive sorting algorithm. Its working principle: first, find the smallest (largest) element in the unsorted sequence and place it at the beginning of the sorted sequence. Then, continue to find the smallest (largest) element from the remaining unsorted elements and place it at the end of the sorted sequence. Repeat this process until all elements are sorted.
def select_sort(alist):
n = len(alist)
for i in range(n):
# Set index i as the index of the minimum value
min_idx = i
# By comparison, continuously adjust the position of the minimum value
for j in range(i,n):
if alist[j] < alist[min_idx]:
min_idx = j
# Swap the i-th value with the minimum value
alist[i], alist[min_idx] = alist[min_idx], alist[i]
return alist
start = time.time()
result = select_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))
Heap Sort¶
Heap Sort is a sorting algorithm designed using the heap data structure. A heap is a nearly complete binary tree structure that satisfies the heap property: the key value or index of child nodes is always less than (or greater than) that of its parent node. Heap Sort can be considered a selection sort that utilizes the concept of heaps.
# Adjust the heap structure so that the parent node value is greater than the child node values
def max_heap(heap, heapsize, root):
left = 2*root+1
right = left + 1
large = root
if left < heapsize and heap[large] < heap[left]:
large = left
if right < heapsize and heap[large] < heap[right]:
large = right
# If large is right or left, a child node larger than the parent is found; swap them to make the child the parent
if large != root:
heap[large], heap[root] = heap[root], heap[large]
max_heap(heap, heapsize, large)
# Construct a heap and reorder the data in the heap
def build_max_heap(heap):
length = len(heap)
# Adjust the structure from the back forward
for i in range((length-2)//2,-1,-1):
max_heap(heap, length, i)
# Swap the root node with the last element and continue the adjustment process for the first len-1 nodes
def heap_sort(heap):
build_max_heap(heap)
for i in range(len(heap)-1,-1,-1):
heap[0], heap[i] = heap[i], heap[0]
max_heap(heap,i,0)
return heap
start = time.time()
result = heap_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))
Merge Sort¶
Merge Sort is an efficient sorting algorithm based on the merge operation. This algorithm is a very typical application of the divide-and-conquer strategy.
Divide-and-Conquer:
- Divide: Recursively split the current sequence into two equal halves
- Conquer: Integrate the subsequences obtained in the previous step while maintaining element order (merge)
def merge_sort(alist):
if len(alist) < 2:
return alist
# Split the list into smaller two lists
mid = int(len(alist)/2)
# Process the left and right lists separately, returning two sorted lists
left = merge_sort(alist[:mid])
right = merge_sort(alist[mid:])
# Merge the two sorted lists to produce a new sorted list
return merge(left,right)
def merge(left,right):
i = 0
j = 0
result = []
while i < len(left) and j < len(right):
if left[i] < right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result += left[i:]
result += right[j:]
return result
start = time.time()
result = merge_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))
Counting Sort¶
The core of Counting Sort is to convert input data values into keys stored in an additionally allocated array space. As a linear time complexity sorting algorithm, Counting Sort requires the input data to be integers with a known range.
For each input element a[i], determine the number of elements less than a[i]. Thus, a[i] can be directly placed at its position in the output array. If there are 5 elements less than a[i], a[i] should be placed at the 6th position in the array.
def count_sort(alist):
# Find the minimum and maximum values
min_num = min(alist)
max_num = max(alist)
# Initialize the counting list
count_list = [0]*(max_num-min_num+1)
# Count each element in the list
for num in alist:
count_list[num-min_num] += 1
alist.clear()
# When the count of an element is not zero, fill the element back into the alist
for cur_num, count in enumerate(count_list):
while count != 0:
alist.append(cur_num+min_num)
count -= 1
return alist
start = time.time()
result = count_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))
Bucket Sort¶
Divide the array a into n sub-intervals (buckets) of the same size, sort each sub-interval individually, and finally merge. Bucket Sort requires the data distribution to be uniform, otherwise it may fail. Counting Sort is a special case of Bucket Sort, where each bucket can be considered to contain only one element.
Principle:
(1) Set a fixed-size array as empty buckets
(2) Traverse the input data and place each data into the corresponding bucket
(3) Sort each non-empty bucket
(4) Concatenate the sorted data from non-empty buckets
def bucket_sort(alist):
min_num = min(alist)
max_num = max(alist)
# Set the bucket size
bucket_size = (max_num - min_num)/len(alist)
# Create a bucket array
bucket_list = [[]for i in range(len(alist)+1)]
# Fill numbers into buckets
for num in alist:
bucket_list[int((num-min_num)//bucket_size)].append(num)
alist.clear()
#回填,这里桶内部排序直接调用了sorted
for i in bucket_list:
for j in sorted(i):
alist.append(j)
return alist
start = time.time()
result = bucket_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))
Radix Sort¶
The overall idea of Radix Sort is to split the data to be sorted into multiple keywords for sorting. In other words, the essence of Radix Sort is multi-keyword sorting. For example, when sorting scores, if two people have the same total score, the one with a higher Chinese score comes first; if the Chinese score is also the same, the one with a higher Math score comes first. For numbers, units, tens, hundreds digits are different priority keywords.
Principle:
(1) Find the maximum number in the array and determine its number of digits
(2) Create a bucket array
(3) Place numbers into different buckets according to their digit values
(4) Clear the original array and add data from each bucket to the original array in sequence
(5) Perform sorting for the previous digit, and loop until the sorting digit exceeds the maximum number of digits
def radix_sort(alist):
# Record which digit we are sorting currently, starting with the least significant digit (units place)
i = 0
# Number of digits in the maximum number
max_num = max(alist)
j = len(str(max_num))
while i < j:
# Create a bucket array with 10 buckets (digits 0-9)
bucket_list = [[]for i in range(10)]
# Place numbers into different buckets based on the current digit
for num in alist:
bucket_list[int(num/(10**i)%10)].append(num)
# Clear the original list and add data from each bucket
alist.clear()
for l in bucket_list:
for b in l:
alist.append(b)
# Move to the next higher digit
i += 1
return alist
start = time.time()
result = radix_sort(src_list)
end = time.time()
print ("耗时:%d 毫秒" % int(round((end - start) * 1000)))
For a better reading experience, visit Kesci Lab: https://www.kesci.com/home/project/5ebf61f52d8821002dc425d0