I have a big csv file which lists connections between nodes in a graph. example:
0001,95784
0001,98743
0002,00082
0002,00091
So this means that node id 0001 is connected to node 95784 and 98743 and so on. I need to read this into a sparse matrix in numpy. How can i do this? I am new to python so tutorials on this would also help.
Save a sparse matrix to a file using . npz format. Either the file name (string) or an open file (file-like object) where the data will be saved.
Description. S = sparse( A ) converts a full matrix into sparse form by squeezing out any zero elements. If a matrix contains many zeros, converting the matrix to sparse storage saves memory. S = sparse( m,n ) generates an m -by- n all zero sparse matrix.
The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.
Example using lil_matrix (list of list matrix) of scipy.
Row-based linked list matrix.
This contains a list (
self.rows
) of rows, each of which is a sorted list of column indices of non-zero elements. It also contains a list (self.data
) of lists of these elements.
$ cat 1938894-simplified.csv
0,32
1,21
1,23
1,32
2,23
2,53
2,82
3,82
4,46
5,75
7,86
8,28
Code:
#!/usr/bin/env python
import csv
from scipy import sparse
rows, columns = 10, 100
matrix = sparse.lil_matrix( (rows, columns) )
csvreader = csv.reader(open('1938894-simplified.csv'))
for line in csvreader:
row, column = map(int, line)
matrix.data[row].append(column)
print matrix.data
Output:
[[32] [21, 23, 32] [23, 53, 82] [82] [46] [75] [] [86] [28] []]
If you want an adjacency matrix, you can do something like:
from scipy.sparse import *
from scipy import *
from numpy import *
import csv
S = dok_matrix((10000,10000), dtype=bool)
f = open("your_file_name")
reader = csv.reader(f)
for line in reader:
S[int(line[0]),int(line[1])] = True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With