Scipy has many different types of sparse matrices available. What are the most important differences between these types, and what is the difference in their intended usage?
I'm developing a code in python based on a sample code1 in Matlab. One section of the code utilizes sparse matrices - which seem to have a single (annoying) type in Matlab, and I'm trying to figure out which type I should use2 in python.
1: This is for a class. Most people are doing the project in Matlab, but I like to create unnecessary work and confusion --- apparently.
2: This is an academic question: I have the code working properly with the 'CSR' format, but I'm interesting in knowing what the optimal usages are.
Using sparse matrices to store data that contains a large number of zero-valued elements can both save a significant amount of memory and speed up the processing of that data. sparse is an attribute that you can assign to any two-dimensional MATLAB® matrix that is composed of double or logical elements.
Python's SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The sparse matrix representation outputs the row-column tuple where the matrix contains non-zero values along with those values.
These matrices often contain many zero elements, and such matrices with high proportions of zero entries are known as sparse matrices. It is called sparse as it has a relatively low density of non-zero elements. If we store Sparse Matrix as a 2-dimension array, a lot of space is wasted to store all those 0's explicit.
The concept of sparsity is useful in combinatorics and application areas such as network theory and numerical analysis, which typically have a low density of significant data or connections. Large sparse matrices often appear in scientific or engineering applications when solving partial differential equations.
Sorry if I'm not answering this completely enough, but hopefully I can provide some insight.
CSC (Compressed Sparse Column) and CSR (Compressed Sparse Row) are more compact and efficient, but difficult to construct "from scratch". Coo (Coordinate) and DOK (Dictionary of Keys) are easier to construct, and can then be converted to CSC or CSR via matrix.tocsc()
or matrix.tocsr()
.
CSC is more efficient at accessing column-vectors or column operations, generally, as it is stored as arrays of columns and their value at each row.
CSR matrices are the opposite; stored as arrays of rows and their values at each column, and are more efficient at accessing row-vectors or row operations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With