in order to solve one research problem we have to organise the bitmask search in python. As an input we have a raw data (we present it as a sequence of bits). Size is smth about 1,5Gb. as an output we have to get the number of occurrence of a specific bitmasks. Let me give an example to discribe the situation
input: sequence of bits, a bitmask to search(mask length: 12bits)
the first idea (not efficient one) is to use XOR like this:
1step: from input we take 12 first bits(position 0 to 11) and make XOR with mask
2step: from input we take bits from 1 to 12 position and XOR with mask ...
let us proceed 2 first steps:
input sequence 100100011110101010110110011010100101010110101010
mask to search: 100100011110
step 1: take first 12 bits from input: 100100011110 and XOR it with mask.
step 2: teke bits from 1 to 12position: 001000111101 and XOR it with mask.
...
the problem is: how to organise taking bits from an input ? we are able to take first 12 bits, but how can we take bits from 1 to 12 position those we need to proceed next iteration?
Before we use a python BitString package, but the time we spend to search all mask are to high. And one more. The size of a mask can be fron 12 bits up to 256. have any suggestions? task have to be implemented in python
Your algorithm is the naive way to search for "strings" in data, but luckily there are much better algorithms. One example is the KMP algorithm, but there are others that might fit your use case better.
With a better algorithm you can go from complexity of O(n*m)
to O(n+m)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With