Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast n-dimensional sparse array in Python / Cython

Tags:

I have an application that involves large n-dimensional arrays which are very sparse. scipy.sparse has a useful 'vectorized getting and setting' feature, so that Cython can be used to populate a sparse matrix quickly.

Of course the scipy package can't handle n-dimensions. There are two packages I have found that do n-dimensional sparse arrays in python sparray and ndsparse. However it seems neither has the vectorized getting and setting feature.

So I need either:

  • a python package for n-dimensional arrays with vectorized get and set or
  • a c library for sparse arrays which I can easily access with Cython or
  • some 'roll your own' option which I guess would require a c equivalent to a python dict

For my purpose I think mapping the n-dimension coordinates back to 1 or two dimensions could work. What would be better though is to have a dict equivalent that i can access fast inside a Cython loop. I assume this rules out the python dict.

Wondering if someone could give me an example of how to use the c++ map object from within Cython?

like image 421
Neal Hughes Avatar asked Nov 21 '13 06:11

Neal Hughes


People also ask

What does Csr_matrix do in Python?

The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.

How do you multiply sparse matrices in Python?

We use the multiply() method provided in both csc_matrix and csr_matrix classes to multiply two sparse matrices. We can multiply two matrices of same format( both matrices are csc or csr format) and also of different formats ( one matrix is csc and other is csr format).

How do you split a sparse matrix in python?

So we first convert the COO sparse matrix to CSR (Compressed Sparse Row format) matrix using tocsr() function. And then we can slice the sparse matrix rows using the row indices array we created. We can see that after slicing we get a sparse matrix of size 3×5 in CSR format.


1 Answers

If you decide to go with the C dict option, you can use the C++ STL's std::map. It's unlikely that you'll find faster or more robust native code that implements a dictionary/map.

cppmap.pyx:

# distutils: language = c++  cdef extern from "<map>" namespace "std":     cdef cppclass mymap "std::map<int, float>":         mymap()         float& operator[] (const int& k)  cdef mymap m = mymap() cdef int i cdef float value  for i in range(100):     value = 3.0 * i**2     m[i] = value  print m[10] 

setup.py:

from distutils.core import setup from Cython.Build import cythonize setup(name = "cppmapapp"   ext_modules = cythonize('*.pyx')) 

Command line:

$ python setup.py build $ cd build/lib.macosx-10.5-x86_64-2.7 $ python -c 'import cppmap' 300.0 
like image 117
Robert T. McGibbon Avatar answered Sep 28 '22 09:09

Robert T. McGibbon