Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python bitmask (variable length)

Tags:

python

bitmask

in order to solve one research problem we have to organise the bitmask search in python. As an input we have a raw data (we present it as a sequence of bits). Size is smth about 1,5Gb. as an output we have to get the number of occurrence of a specific bitmasks. Let me give an example to discribe the situation

input:    sequence of bits, a bitmask to search(mask length: 12bits)

the first idea (not efficient one) is to use XOR like this:

1step: from input we take 12 first bits(position 0 to 11) and make XOR with mask 
2step: from input we take bits from 1 to 12 position and XOR with mask ...

let us proceed 2 first steps:

input sequence 100100011110101010110110011010100101010110101010
mask to search: 100100011110
step 1: take first 12 bits from input: 100100011110 and XOR it with mask.
step 2: teke bits from 1 to 12position: 001000111101 and XOR it with mask.
...

the problem is: how to organise taking bits from an input ? we are able to take first 12 bits, but how can we take bits from 1 to 12 position those we need to proceed next iteration?

Before we use a python BitString package, but the time we spend to search all mask are to high. And one more. The size of a mask can be fron 12 bits up to 256. have any suggestions? task have to be implemented in python

like image 350
asssag Avatar asked Feb 25 '23 04:02

asssag


1 Answers

Your algorithm is the naive way to search for "strings" in data, but luckily there are much better algorithms. One example is the KMP algorithm, but there are others that might fit your use case better.

With a better algorithm you can go from complexity of O(n*m) to O(n+m).

like image 127
Jochen Ritzel Avatar answered Mar 03 '23 11:03

Jochen Ritzel