Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Anyone Know a Great Sparse One Dimensional Array Library in Python?

I am working on an algorithm in Python that uses arrays of int64s heavily. The arrays are typically sparse and are read from and written to constantly. I am currently using relatively large native arrays and the performance is good but the memory usage is high (as expected).

I would like to be able to have the array implementation not waste space for values that are not used and allow an index offset other than zero. As an example, if my numbers start at 1,000,000 I would like to be able to index my array starting at 1,000,000 and not be required to waste memory with a million unused values.

Array reads and writes needs to be fast. Expanding into new territory can be a small delay but reads and writes should be O(1) if possible.

Does anybody know of a library that can do it?

Thanks!

Updated to mention int64 as the data type.

like image 880
TheJacobTaylor Avatar asked Feb 26 '23 21:02

TheJacobTaylor


2 Answers

It sounds like the blist type (documentation, download) might be just what you're looking for (disclaimer: I'm the author). It has exactly the same interface as Python's list, so there's no learning curve, but it has different performance characteristics. In particular, it can efficiently handle sparse lists in many cases. Below is an example that creates a list with 2**29 elements. It's pretty much instantaneous. Sparse lists created in this manner use O(log n) space.

>>> from blist import *
>>> x = blist([0])             # x is a blist with one element
>>> x *= 2**29                 # x is a blist with > 500 million elements
>>> x.append(5)                # append to x
>>> y = x[4:-234234]           # Take a 500 million element slice from x
>>> del x[3:1024]              # Delete a few thousand elements from x

Operations that iterate over the whole list still take O(n) time (remove, reverse, count, etc.). The documentation spells out the time-complexity for each operation, so you should be able to assess if it will meet your needs.

like image 106
Daniel Stutzbach Avatar answered May 01 '23 08:05

Daniel Stutzbach


You could remap a numpy sparse matrix into a sparse array - or consider using a hash table(a python dict). As far as the offset is concerned, just wrap whatever storage class you are using and make you own insert/lookup/remove methods.

like image 45
zdav Avatar answered May 01 '23 07:05

zdav