Once again, I have a problem for which I would like to shave off nanoseconds. I have a small, constant array and I would like to search it to see if a given number is a member*.
Input: A 64-bit number n.
Output: True if n is in the array, false if n is not.
What are good techniques for making binary searches fast, given the possibility to optimize for the specific elements and their distribution.
I have an array with around 136 members (though see below: there's some flexibility) to search. The members are not equally distributed through the range: they cluster toward the beginning and end of the range. The input numbers can be assumed to the chosen with uniform probability. It's probably worthwhile to take advantage of this irregularity.
Here's a sample picture of the distribution for the 136-element array. Note that only 12 of the 136 elements are between 1% and 99% of the range; the balance are below 1% or over 99%.
(source: crg4.com)
I assume that branch misprediction will be the largest cost of any implementation. I'd be happy to be proved wrong.
* Actually, I have two arrays. Actually actually, I have a choice of what arrays to use: efficiency suggests that the first should have perhaps 10-40 members, while the second can have no more than (exactly) 136 members. My problem gives real flexibility in selecting sizes, along with limited freedom to decide precisely which members to use. If a method performs better with certain sizes or restrictions, please mention this because I may be able to use it. All things equal, I'd prefer to have the second array as large as possible. For reasons unrelated to the binary search I may need to reduce the size of the second array to <= 135 or <= 66 (this is related to the difficulty of determining the input number, which depends on the array selected).
Here's one of the possible arrays, if it helps in testing ideas. (This pretty well reveals my purpose...!) Don't jump to unwarranted conclusions on the basis of the first few members, though.
0, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049, 12586269025, 20365011074, 32951280099, 53316291173, 86267571272, 139583862445, 225851433717, 365435296162, 591286729879, 956722026041, 1548008755920, 2504730781961, 4052739537881, 6557470319842, 10610209857723, 17167680177565, 27777890035288, 44945570212853, 72723460248141, 117669030460994, 190392490709135, 308061521170129, 498454011879264, 806515533049393, 1304969544928657, 2111485077978050, 3416454622906707, 5527939700884757, 8944394323791464, 14472334024676221, 23416728348467685, 37889062373143906, 61305790721611591, 99194853094755497, 160500643816367088, 259695496911122585, 420196140727489673, 679891637638612258, 1100087778366101931, 1779979416004714189, 2880067194370816120, 4660046610375530309, 7540113804746346429, 9320093220751060618, 9999984858389672876, 10259680355300795461, 10358875208395550958, 10396764270768694864, 10411236604793371085, 10416764544494255842, 10418876029572233892, 10419682545105283285, 10419990606626453414, 10420108275656914408, 10420153221227127261, 10420170388907304826, 10420176946377624668, 10420179451108406629, 10420180407830432670, 10420180773265728832, 10420180912849591277, 10420180966165882450, 10420180986530893524, 10420180994309635573, 10420180997280850646, 10420180998415753816, 10420180998849248253, 10420180999014828394, 10420180999078074380, 10420180999102232197, 10420180999111459662, 10420180999114984240, 10420180999116330509, 10420180999116844738, 10420180999117041156, 10420180999117116181, 10420180999117144838, 10420180999117155784, 10420180999117159965, 10420180999117161562, 10420180999117162172, 10420180999117162405, 10420180999117162494, 10420180999117162528, 10420180999117162541, 10420180999117162546, 10420180999117162548
I will initially run the program on a Phenom II x4, but optimizations for other architectures are welcome.
If all you're interested in is member/not-member, rather than location, you could eliminate some conditional branches with the following arrangement:
bool b = false;
b |= (n == x[i]);
b |= (n == x[i+1]);
// ... etc. ...
Clearly, you probably don't want to do this for all 136 entries. But there may be a sweet spot where you are able to mix a coarse-grained binary search to first locate which batch of e.g. 4 elements n
could be in, and then switch to the above approach.
Since you know the array data at compile time, you might consider using a hash instead of a binary search. A carefully chosen hash could be faster, especially if you can find a simple hash function that is collision-free for your data, and fits within your memory constraints.
Edit: to further explain...
You hash the 64-bit value to a smaller value, which is an index into an array. In your case, you would aim to have a collision-free hash, so your array would just be an array of a) for hits, the 1 valid value that hashes to that array index, or b) for misses, an invalid value.
You pick the hash function that suits your purposes. In your case, the main parameters are:
Assuming no collisions, you use it at run-time as follows:
If your hash function produces collisions, your choices are:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With