This is part of a search function on a website. So im trying to find a way to get to the end result as fast as possible.
Have a binary number where digit order matters.
Input Number = 01001
Have a database of other binary numbers all the same length.
01000, 10110, 00000, 11111
I dont know how to write what im doing, so im going to do it more visually below.
// Zeros mean nothing & the location of a 1 matters, not the total number of 1's.
input num > 0 1 0 0 1 = 2 possible matches
number[1] > 0 1 0 0 0 = 1 match = 50% match
number[2] > 1 0 1 1 0 = 0 match = 0% match
number[3] > 0 0 0 0 0 = 0 match = 0% match
number[4] > 1 1 1 1 1 = 2 match = 100% match
Now obviously, you could go digit by digit, number by number and compare it that way (using a loop and what not). But I was hoping there might be an algorithm or something that will help. Mostly because in the above example I only used 5 digit numbers. But im going to be routinely comparing around 100,000 numbers with 200 digits each, that's a lot of calculating.
I usually deal with php and MySQL. But if something spectacular comes up I could always learn.
If it's possible to somehow chop up your bitstrings in integer-size chunks some elementary boolean arithmetic would do, and that kind of instructions is generally pretty fast
$matchmask = ~ ($inputval ^ $tomatch) & $inputval
What this does:
Then count the number of bits set in the result, look at How to count the number of set bits in a 32-bit integer? for an optimal solution, easily translated into php
Instead of checking each bit, you could pre-process the input and determine which bits need checking. In the worst case, this devolves into processing each bit, but for a normal distribution, you'll save some processing.
That is, for input
01001
, iterate over the database and determine if number1[0] & input
is non-zero, and (number1[3] >> 8) & input
is non-zero, assuming 0 as the index of the LSB. How you get fast bit-shifting and anding with the large numbers is on you, however. If you detect 1s than 0s in the input, you could always invert the input and test for zero to detect coverage.
This will give you modest improvement, but it's at best a constant-time reduction of the problem. If most of your inputs are balanced between 0s and 1s, you'll halve the number of required operations. If it's more biased, you'll get better results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With