The function is defined as
void bucketsort(Array& A){
size_t numBuckets=A.size();
iarray<List> buckets(numBuckets);
//put in buckets
for(size_t i=0;i!=A.size();i++){
buckets[int(numBuckets*A[i])].push_back(A[i]);
}
////get back from buckets
//for(size_t i=0,head=0;i!=numBuckets;i++){
//size_t bucket_size=buckets[i].size();
//for(size_t j=0;j!=bucket_size;j++){
// A[head+j] = buckets[i].front();
// buckets[i].pop_front();
//}
//head += bucket_size;
//}
for(size_t i=0,head=0;i!=numBuckets;i++){
while(!buckets[i].empty()){
A[head] = buckets[i].back();
buckets[i].pop_back();
head++;
}
}
//inseration sort
insertionsort(A);
}
where List
is just list<double>
in STL.
The content of array are generate randomly in [0,1)
. Theoretically bucket sort should be faster than quicksort for large size for it's O(n),but it fails as in the following graph.
I use google-perftools
to profile it on a 10000000 double array. It reports as follow
It seems I should not use STL list,but I wonder why? Which does std_List_node_base_M_hook
do? Should I write list class myself?
PS:The experiment and improvement
I have tried just leave the codes of putting in buckets and this explained that most time is used on building up buckets.
The following improvement is made:
- Use STL vector as buckets and reserve reasonable space for buckets
- Use two helper array to store the information used in building buckets,thus avoiding the use of linked list,as in following code
void bucketsort2(Array& A){
size_t numBuckets = ceil(A.size()/1000);
Array B(A.size());
IndexArray head(numBuckets+1,0),offset(numBuckets,0);//extra end of head is used to avoid checking of i == A.size()-1
for(size_t i=0;i!=A.size();i++){
head[int(numBuckets*A[i])+1]++;//Note the +1
}
for(size_t i=2;i<numBuckets;i++){//head[1] is right already
head[i] += head[i-1];
}
for(size_t i=0;i<A.size();i++){
size_t bucket_num = int(numBuckets*A[i]);
B[head[bucket_num]+offset[bucket_num]] = A[i];
offset[bucket_num]++;
}
A.swap(B);
//insertionsort(A);
for(size_t i=0;i<numBuckets;i++)
quicksort_range(A,head[i],head[i]+offset[i]);
}
The result in the following graph
where line start with list using list as buckets,start with vector using vector as buckets,start 2 using helper arrays.By default insertion sort is used at last and some use quick sort as the bucket size is big.
Note "list" and "list,only put in" ,"vector,reserve 8" and "vector,reserve 2" nearly overlap.
I will try small size with enough memory reserved.
Time complexity analysis The time complexity in Bucket Sort largely depends upon the size of the bucket list and also the range over which the elements in the array/list have been distributed.
Limitations of Bucket SortIt is not suitable for sorting arbitrary strings. It is only good for sorting data uniformly distributed over the range [0, 1]. It is not an in-place sorting because the buckets require additional space to be sorted. It is ineffective if we have a huge array since it increases the cost.
Bucket sort can be exceptionally fast because of the way elements are assigned to buckets, typically using an array where the index is the value. This means that more auxiliary memory is required for the buckets at the cost of running time than more comparison sorts.
As a result, bucket sort works best when the data are more or less uniformly distributed or where there is an intelligent way to choose the buckets given a quick set of heuristics based on the input array. Bucket sort also works well if you have a large degree of parallelism available.
In my opinion, the biggest bottleneck here is memory management functions (such as new
and delete
).
Quicksort (of which STL probably uses an optimized version) can sort an array in-place, meaning it requires absolutely no heap allocations. That is why it performs so well in practice.
Bucket sort relies on additional working space, which is assumed to be readily available in theory (i.e. memory allocation is assumed to take no time at all). In practice, memory allocation can take anywhere from (large) constant time to linear time in the size of memory requested (Windows, for example, will take time to zero the contents of pages when they are allocated). This means standard linked list implementations are going to suffer, and dominate the running time of your sort.
Try using a custom list implementation that pre-allocates memory for a large number of items, and you should see your sort running much faster.
With
iarray<List> buckets(numBuckets);
you are basically creating a LOT of lists and that can cost you a lot especially in memory access which it theoretically linear but that's not the case in practice.
Try to reduce the number of buckets.
To verify my assertion analyse your code speed with only the creation of the lists.
Also to iterate over the elements of the lists you should not use .size()
but rather
//get back from buckets
for(size_t i=0,head=0;i!=numBuckets;i++)
while(!buckets[i].empty())
{
A[head++] = buckets[i].front();
buckets[i].pop_front();
}
In some implementations .size()
can be in O(n). Unlikely but...
Seems it is only to insert an element at a given place in a list. Shouldn't cost a lot..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With