Bucket Sort (Bucket Sort) in C++ Implementation

Bucket Sort is a non-comparison-based sorting algorithm that works by distributing elements to be sorted into multiple “buckets”, sorting each bucket individually, and then merging the results of all buckets to achieve overall sorting. The core of this method is reasonably dividing the range of buckets, ensuring that the number of elements in each bucket is as small as possible to reduce sorting costs.

Algorithm Idea (with Floating-Point Numbers as an Example)

Assume the elements of the array to be sorted are floating-point numbers in the range [0, 1), and the array length is n. The steps are as follows:
1. Create Buckets: Based on the array length n, create n empty buckets (each bucket is a container for storing elements in the same interval).
2. Distribute Elements to Buckets: Traverse the array and place each element x into the corresponding bucket using the rule: bucket index = static_cast<int>(x * n). For example, if the element is 0.42 and n=7, the bucket index is 0.42*7=2.94, and the integer part (2) is taken, so it is placed into the 2nd bucket.
3. Sort Elements Within Each Bucket: Use a built-in sorting function (e.g., std::sort in C++) to sort elements within each bucket individually.
4. Merge Results: Traverse all buckets in order, take elements from each bucket sequentially, and merge them to obtain the final sorted array.

C++ Code Implementation

#include <iostream>
#include <vector>
#include <algorithm>  // For std::sort

using namespace std;

// Bucket Sort function: Input is an array of floating-point numbers in [0,1)
vector<double> bucketSort(vector<double>& arr) {
    int n = arr.size();
    if (n == 0) return arr;  // Return empty array directly

    // Step 1: Create n empty buckets (each bucket is a vector<double>)
    vector<vector<double>> buckets(n);

    // Step 2: Distribute elements into corresponding buckets
    for (double x : arr) {
        // Calculate bucket index: x is in [0,1), so bucket index = x * n (take integer part)
        int bucketIndex = static_cast<int>(x * n);
        buckets[bucketIndex].push_back(x);
    }

    // Step 3: Sort elements within each bucket
    for (auto& bucket : buckets) {
        sort(bucket.begin(), bucket.end());  // Use built-in sort for simplicity and efficiency
    }

    // Step 4: Merge elements from all buckets
    vector<double> result;
    for (auto& bucket : buckets) {
        result.insert(result.end(), bucket.begin(), bucket.end());
    }

    return result;
}

// Test function
int main() {
    vector<double> arr = {0.42, 0.32, 0.33, 0.52, 0.37, 0.47, 0.51};
    cout << "Array before sorting: ";
    for (double num : arr) cout << num << " ";
    cout << endl;

    vector<double> sortedArr = bucketSort(arr);

    cout << "Array after sorting: ";
    for (double num : sortedArr) cout << num << " ";
    cout << endl;

    return 0;
}

Code Explanation

  • Creating Buckets: vector<vector<double>> buckets(n) initializes n empty buckets, each used to store elements in the same interval.
  • Distributing Elements: The bucket index is calculated by x * n, ensuring each element falls into the correct bucket. For example, with n=7, 0.42*7=2.94 results in a bucket index of 2, so 0.42 is placed in the 2nd bucket.
  • Sorting Within Buckets: std::sort is used to sort elements in each bucket, avoiding complex sorting logic.
  • Merging Results: Traverse all buckets, insert elements from each bucket into the result array in order, and finally obtain a sorted array in ascending order.

Complexity Analysis

  • Time Complexity:
  • Average case: O(n) (when elements are evenly distributed across buckets, the total time to sort n buckets is O(n)).
  • Worst case: O(n log n) (when all elements fall into one bucket, sorting that bucket takes O(n log n) time).
  • Space Complexity: O(n + k) (where k is the number of buckets; typically k = n, so space complexity is O(n)).

Applicable Scenarios

  • Uniform Data Distribution: Suitable for arrays with a clear range and dense elements (e.g., interval data in statistical analysis).
  • Avoiding Extreme Cases: If data distribution is extremely uneven (e.g., all elements fall into one bucket), Bucket Sort degrades to O(n log n), making Quick Sort or other comparison-based sorts more appropriate.

Bucket Sort efficiently completes sorting when data distribution is reasonable, especially for numerical data with a clear range.

Xiaoye