SciPy NumPy and SciKit-learn , create a sparse matrix

Tags:

I'm currently trying to classify text. My dataset is too big and as suggested here, I need to use a sparse matrix. My question is now, what is the right way to add an element to a sparse matrix? Let's say for example I have a matrix X which is my input .

X = np.random.randint(2, size=(6, 100))

Now this matrix X looks like an ndarray of an ndarray (or something like that).

If I do

X2 = csr_matrix(X)

I have the sparse matrix, but how can I add another element to the sparce matrix ? for example this dense element: [1,0,0,0,1,1,1,0,...,0,1,0] to a sparse vector, how do I add it to the sparse input matrix ?

(btw, I'm very new at python, scipy,numpy,scikit ... everything)

816

asked Dec 06 '12 11:12

Olivier_s_j

1 Answers

Scikit-learn has a great documentation, with great tutorials that you really should read before trying to invent it yourself. This one is the first one to read it explains how to classify text, step-by-step, and this one is a detailed example on text classification using sparse representation.

Pay extra attention to the parts where they talk about sparse representations, in this section. In general, if you want to use svm with linear kernel and you large amount of data, LinearSVC (which is based on Liblinear) is better.

Regarding your question - I'm sure there are many ways to concatenate two sparse matrices (btw this is what you should look for in google for other ways of doing it), here is one, but you'll have to convert from csr_matrix to coo_matrix which is anther type of sparse matrix: Is there an efficient way of concatenating scipy.sparse matrices?.

EDIT: When concatenating two matrices (or a matrix and an array which is a 1 dimenesional matrix) the general idea is to concatenate X1.data and X2.data and manipulate their indices and indptrs (or row and col in case of coo_matrix) to point to the correct places. Some sparse representations are better for specific operations and more complex for other operations, you should read about csr_matrix and see if this is the best representation. But I really urge you to start from those tutorials I posted above.

143

answered Nov 07 '22 12:11

zenpoy

Related questions
                            
                                python tkinter treeview right click (Button-3) event to select item in treeview
                            
                                How to know the byte position of a row of a CSV file in python?
                            
                                Selenium selecting a dropdown option with for loop from dictionary
                            
                                Parsing muilti dimensional Json array to Python
                            
                                How to decode an invalid json string in python
                            
                                Why is this Jinja nl2br filter escaping <br>'s but not <p>'s?
                            
                                Python lxml - How to remove empty repeated tags
                            
                                how to get output of grep command (Python)
                            
                                Why does Django ORM allow me to omit parameters for NOT NULL fields when creating an object?
                            
                                What is the default nltk part of speech tagset?
                            
                                Python object cache
                            
                                Rotated document with ReportLab (vertical text)
                            
                                failure to import pymongo ubuntu
                            
                                Flushing all current figures in matplotlib
                            
                                Python 3 static members
                            
                                Concatenate all rows of a numpy matrix in python
                            
                                Is Python set more space efficient than list?
                            
                                Replace CentralWidget in MainWindow
                            
                                Django model multiple updates with objects' own data?
                            
                                Is sequence unpacking atomic?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SciPy NumPy and SciKit-learn , create a sparse matrix

Tags:

python

matrix

numpy

scipy

scikit-learn

Olivier_s_j

People also ask

1 Answers

zenpoy

Recent Activity

Donate For Us