Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manipulating very large binary strings in c#

Tags:

string

c#

binary

I'm working on a genetic algorithm project where I encode my chromosome in a binary string where each 32 bits represents a value. The problem is that the functions I'm optimizing has over 3000 parameters which implies that I have over 96000 bits in my bit string and the manipulations i do on this are simply to slow...

I have proceeded as following: I have a binary class where I'm creating a boolean array that represents my big binary string. Then I manipulate this binary string with various shifts and moves and such.

My question is, is there a better way to do this? The speed is just killing. I'm sure it would be fine if i only did this on one bit string but i have to do the manipulations on 25 bit strings for way over 1000 generations.

like image 243
Erik Avatar asked Dec 16 '22 15:12

Erik


2 Answers

What I would do is take a step back. Your analysis seems to be wedded to an implementation detail, namely that you have chosen bool[] as how you represent a string of bits.

Clear your mind of bools and arrays and make a complete list of the operations you actually need to perform, how frequently they happen, and how fast they have to be. Ideally consider whether your speed requirement is average speed or worst case speed. (There are many data structures that attain high average speed by having one expensive operation for every thousand cheap operations; if having any expensive operations is unacceptable then you need to know that up front.)

Once you have that list, you can then do research on what data structures work well.

For example, suppose your list of operations is:

  • construct bit sequences on the order of 32 bits
  • concatenate on the order of 3000 bit sequences together to form new bit sequences
  • insert new bit sequences into existing long bit sequences at specific locations, quickly

Given just that list of operations, I'd think that the data structure you want is a catenable deque. Catenable deques support fast insertion on either end, and can be broken up into two deques efficiently. Inserting stuff into the middle of a deque is then easily done: break the deque up, insert the item into the end of one half, and join them back up again.

However, if you then add another operation to the problem, say, "search for a particular bit string anywhere in the 90000-bit sequence, and find the result in sublinear time" then just a catenable deque isn't going to do it. Searching a deque is slow. There are other data structures that support that operation.

like image 95
Eric Lippert Avatar answered Dec 29 '22 01:12

Eric Lippert


If I understood correctly you are encoding the bit array in a bool[]. The first obvious optimisation would be to change this to int[] (or even long[]) and take advantage of bit operations on a whole machine word, where possible.

For example, this would make shifts more efficient by ~ a factor 4.

like image 43
Konrad Rudolph Avatar answered Dec 29 '22 01:12

Konrad Rudolph