I have a set of integer ranges, which represent lower and upper bounds of classes. For example:
0..500 xsmall
500..1000 small
1000..1500 medium
1500..2500 large
In my case there can be over 500 classes. These classes do not overlap, but they can differ in size.
I can implement finding the matching range as a simple linear search through a list, for example
class Range
{
int lower;
int upper;
String category;
boolean contains(int val)
{
return lower <= val && val < upper;
}
}
public String getMatchingCategory(int val)
{
for (Range r : listOfRanges)
{
if (r.contains(val))
{
return r.category;
}
}
return null;
}
However, this seems slow; as I need on average N/2 look-ups. If the classes were equally sized, I could use division. Is there a standard technique to find the correct range faster?
What you are looking for is a SortedMap
and its methods tailMap
and firstKey
. Check out the documentation for full details.
The advantage of this approach over plain arrays is in the ease of maintaining your ranges: you can insert/remove new boundaries at any point with almost no runtime cost; with arrays it means copying both parallel arrays in full.
I've written code for both variants and benchmarked it:
@State(Scope.Thread)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class BinarySearch
{
static final int ARRAY_SIZE = 128, INCREMENT = 1000;
static final int[] arrayK = new int[ARRAY_SIZE];
static final String[] arrayV = new String[ARRAY_SIZE];
static final SortedMap<Integer,String> map = new TreeMap<>();
static {
for (int i = 0, j = 0; i < arrayK.length; i++) {
arrayK[i] = j; arrayV[i] = String.valueOf(j);
map.put(j, String.valueOf(j));
j += INCREMENT;
}
}
final Random rnd = new Random();
int rndInt;
@Setup(Level.Invocation) public void nextInt() {
rndInt = rnd.nextInt((ARRAY_SIZE-1)*INCREMENT);
}
@GenerateMicroBenchmark
public String array() {
final int i = Arrays.binarySearch(arrayK, rndInt);
return arrayV[i >= 0? i : -(i+1)];
}
@GenerateMicroBenchmark
public String sortedMap() {
return map.tailMap(rndInt).values().iterator().next();
}
}
Benchmark results:
Benchmark Mode Thr Cnt Sec Mean Mean error Units
array thrpt 1 5 5 10.948 0.033 ops/usec
sortedMap thrpt 1 5 5 5.752 0.070 ops/usec
Interpretation: array search is only twice as fast and this factor is quite stable across array sizes. In the presented code the array size is 1024 and the factor is 1.9. I've also tested with array size 128, where the factor is 2.05.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With