I have a buckets of numbers e.g. - 1 to 4, 5 to 15, 16 to 21, 22 to 34,.... I have roughly 600,000 such buckets. The range of numbers that fall in each of the bucket varies. I need to store these buckets in a suitable data structure so that the lookups for a number is as fast as possible.
So my question is what is the suitable data structure and a sorting mechanism for this type of problem.
Thanks in advance
A bucket data structure is a data structure that uses the key values as the indices of the buckets, and store items of the same key value in the corresponding bucket. Naturally it makes the most sense to use the bucket data structure with integer key values.
Bucketing builds, the hash table as a 2D array instead of a single dimensional array. Every entry in the array is big, sufficient to hold M items (M is not amount of data. Just a constant). Problems. Lots of wasted space are created.
Bucket sort is a useful technique that is commonly used when inputs are uniformly distributed. In computer science, it is a very simple way for comparison, which is based on the algorithm of sorting. Bucket sort can also be used for partitioning an array into buckets where each bucket is individually sorted.
If the buckets are contiguous and disjoint, as in your example, you need to store in a vector just the left bound of each bucket (i.e. 1, 5, 16, 22) plus, as the last element, the first number that doesn't fall in any bucket (35). (I assume, of course, that you are talking about integer numbers.)
Keep the vector sorted. You can search the bucket in O(log n), with kind-of-binary search. To search which bucket does a number x belong to, just go for the only index i such that vector[i] <= x < vector[i+1]. If x is strictly less than vector[0], or if it is greater than or equal to the last element of vector, then no bucket contains it.
EDIT. Here is what I mean:
#include <stdio.h>
// ~ Binary search. Should be O(log n)
int findBucket(int aNumber, int *leftBounds, int left, int right)
{
int middle;
if(aNumber < leftBounds[left] || leftBounds[right] <= aNumber) // cannot find
return -1;
if(left + 1 == right) // found
return left;
middle = left + (right - left)/2;
if( leftBounds[left] <= aNumber && aNumber < leftBounds[middle] )
return findBucket(aNumber, leftBounds, left, middle);
else
return findBucket(aNumber, leftBounds, middle, right);
}
#define NBUCKETS 12
int main(void)
{
int leftBounds[NBUCKETS+1] = {1, 4, 7, 15, 32, 36, 44, 55, 67, 68, 79, 99, 101};
// The buckets are 1-3, 4-6, 7-14, 15-31, ...
int aNumber;
for(aNumber = -3; aNumber < 103; aNumber++)
{
int index = findBucket(aNumber, leftBounds, 0, NBUCKETS);
if(index < 0)
printf("%d: Bucket not found\n", aNumber);
else
printf("%d belongs to the bucket %d-%d\n", aNumber, leftBounds[index], leftBounds[index+1]-1);
}
return 0;
}
You will probably want some kind of sorted tree, like a B-Tree, B+ Tree, or Binary Search tree.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With