Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas / Numpy: How to Turn Column Data Into Sparse Matrix

I'm working on an iPython project with Pandas and Numpy. I'm just learning too so this question is probably pretty basic. Lets say I have two columns of data

---------------
| col1 | col2 | 
---------------
| a    | b    |
| c    | d    |
| b    | e    |
---------------

I want to transform this data of the form.

---------------------
| a | b | c | d | e |
---------------------
| 1 | 1 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 0 | 1 |
---------------------

Then I want to take a three column version

---------------------
| col1 | col2 | val | 
---------------------
| a    | b    | .5  |
| c    | d    | .3  |
| b    | e    | .2  |
---------------------

and turn it into

---------------------------
| a | b | c | d | e | val |
---------------------------
| 1 | 1 | 0 | 0 | 0 | .5  |
| 0 | 0 | 1 | 1 | 0 | .3  |
| 0 | 1 | 0 | 0 | 1 | .2  |
---------------------------

I'm very new to Pandas and Numpy, how would I do this? What functions would I use?

like image 916
David Williams Avatar asked Jan 04 '14 22:01

David Williams


People also ask

How do you turn a matrix into a sparse?

S = sparse( A ) converts a full matrix into sparse form by squeezing out any zero elements. If a matrix contains many zeros, converting the matrix to sparse storage saves memory. S = sparse( m,n ) generates an m -by- n all zero sparse matrix.

How do you convert a DataFrame to a CSR matrix?

To convert a DataFrame to a CSR matrix, you first need to create indices for users and movies. Then, you can perform conversion with the sparse. csr_matrix function. It is a bit faster to convert via a coordinate (COO) matrix.

How do you convert dense to sparse matrix?

A dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.


1 Answers

I think you're looking for the pandas.get_dummies() function and pandas.DataFrame.combineAdd method.

In [7]: df = pd.DataFrame({'col1': list('acb'),
                           'col2': list('bde'),
                           'val': [.5, .3, .2]})

In [8]: df1 = pd.get_dummies(df.col1)

In [9]: df2 = pd.get_dummies(df.col2)

This produces the following two dataframes:

In [16]: df1
Out[16]: 
   a  b  c
0  1  0  0
1  0  0  1
2  0  1  0

[3 rows x 3 columns]

In [17]: df2
Out[17]: 
   b  d  e
0  1  0  0
1  0  1  0
2  0  0  1

[3 rows x 3 columns]

Which can be combined as follows:

In [10]: dummies = df1.combineAdd(df2)

In [18]: dummies
Out[18]: 
   a  b  c  d  e
0  1  1  0  0  0
1  0  0  1  1  0
2  0  1  0  0  1

[3 rows x 5 columns]

The last step is to copy the val column into the new dataframe.

In [19]: dummies['val'] = df.val

In [20]: dummies
Out[20]: 
   a  b  c  d  e  val
0  1  1  0  0  0  0.5
1  0  0  1  1  0  0.3
2  0  1  0  0  1  0.2

[3 rows x 6 columns]
like image 96
Garrett Avatar answered Oct 02 '22 19:10

Garrett