Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multi dimensional sparse array

Tags:

python

scipy

I am working on a project where I need to deal with 3 dimensional large array. I was using numpy 3d array but most of my entries are going to be zero, so it's lots of wastage of memory. Scipy sparse seems to allow only 2D matrix. Is there any other way I can store 3D sparse array?

like image 647
Naman Avatar asked Apr 25 '15 22:04

Naman


2 Answers

scipy.sparse has a number of formats, though only a couple have an efficient set of numeric operations. Unfortunately, those are the harder ones to extend.

dok uses a tuple of the indices as dictionary keys. So that would be easy to generalize from 2d to 3d or more. coo has row, col, data attribute arrays. Conceptually then, adding a third depth(?) is easy. lil probably would require lists within lists, which could get messy.

But csr and csc store the array in indices, indptr and data arrays. This format was worked out years ago by mathematicians working with linear algebra problems, along with efficient math operations (esp matrix multiplication). (The relevant paper is cited in the source code).

So representing 3d sparse arrays is not a problem, but implementing efficient vector operations could require some fundamental mathematical research.

Do you really need the 3d layout to do the vector operations? Could you, for example, reshape 2 of the dimensions into 1, at least temporarily?

Element by element operations (*,+,-) work just as well with the data of a flattened array as with the 2 or 3d version. np.tensordot handles nD matrix multiplication by reshaping the inputs into 2D arrays, and applying np.dot. Even when np.einsum is used on 3d arrays, the product summation is normally over just one pair of dimensions (e.g. 'ijk,jl->ikl')

3D representation can be conceptually convenient, but I can't think of a mathematical operation that requires it (instead of 2 or 1d).

Overall I think you'll get more speed from reshaping your arrays than from trying to find/implement genuine 3d sparse operations.

like image 76
hpaulj Avatar answered Sep 30 '22 14:09

hpaulj


You're right; it doesn't look like there are established tools for working with n-dimensional sparse arrays. If you just need to access elements from the array there are options using a dictionary keyed on tuples. See:

sparse 3d matrix/array in Python?

If you need to do operations on the sparse 3d matrix, it gets harder- you may have to do some of the coding yourself.

like image 30
chepyle Avatar answered Sep 30 '22 15:09

chepyle