Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python space+time efficient Data Structure to store 2D Bit Arrays

Tags:

python

I want to create a 2D Binary (Bit) Array in Python in a space and time efficient way as my 2D bitarray would be around 1 Million(rows) * 50000(columns of 0's or 1's) and also I would be performing bitwise operations over these huge elements. My array would look something like:

0 1 0 1
1 1 1 0
1 0 0 0 
...

In C++ most efficient way (space) for me would be to create a kind of array of integers where each element represents 32 bits and then I can use the shift operators coupled with bit-wise operators to carry operations.

Now I know that there is a bitarray module in python. But I am unable to create a 2D structure using list of bit-arrays. How can I do this?

Another way I know in C++ would be to create a map something like map<id, vector<int> > where then I can manipulate the vector as I have mentioned above. Should I use the dictionary equivalent in python?

Even if you suggest me some way out to use bit array for this task it will be great If I can get to know whether I can have multiple threads operate on a splice of bitarray so that I can make it multithreaded. Thanks for the help!!

EDIT:

I can even go on creating my own data structure for this if the need be. However just wanted to check before reinventing the wheel.

like image 485
Yavar Avatar asked Feb 03 '23 05:02

Yavar


2 Answers

As per my comment, you may be able to use sets

0 1 0 1
1 1 1 0
1 0 0 0 

can be represented as

set([(1,0), (3,0), (0,1), (1,1), (2, 1), (0,2)])

or

{(1,0), (3,0), (0,1), (1,1), (2, 1), (0,2)}

an AND is equivalent to an intersection of 2 sets
OR is the union of the 2 sets

like image 170
John La Rooy Avatar answered Feb 05 '23 16:02

John La Rooy


How about the following:

In [11]: from bitarray import bitarray

In [12]: arr = [bitarray(50) for i in xrange(10)]

This creates a 10x50 bit array, which you can access as follows:

In [15]: arr[0][1] = True

In [16]: arr[0][1]
Out[16]: True

Bear in mind that a 1Mx50K array would require about 6GB of memory (and a 64-bit build of Python on a 64-bit OS).

whether I can have multiple threads operate on a splice of bitarray so that I can make it multithreaded

That shouldn't be a problem, with the usual caveats. Bear in mind that due to the GIL you are unlikely to achieve performance improvements through multithreading.

like image 29
NPE Avatar answered Feb 05 '23 16:02

NPE