I'm working on an iPython project with Pandas and Numpy. I'm just learning too so this question is probably pretty basic. Lets say I have two columns of data
---------------
| col1 | col2 | 
---------------
| a    | b    |
| c    | d    |
| b    | e    |
---------------
I want to transform this data of the form.
---------------------
| a | b | c | d | e |
---------------------
| 1 | 1 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 0 | 1 |
---------------------
Then I want to take a three column version
---------------------
| col1 | col2 | val | 
---------------------
| a    | b    | .5  |
| c    | d    | .3  |
| b    | e    | .2  |
---------------------
and turn it into
---------------------------
| a | b | c | d | e | val |
---------------------------
| 1 | 1 | 0 | 0 | 0 | .5  |
| 0 | 0 | 1 | 1 | 0 | .3  |
| 0 | 1 | 0 | 0 | 1 | .2  |
---------------------------
I'm very new to Pandas and Numpy, how would I do this? What functions would I use?
S = sparse( A ) converts a full matrix into sparse form by squeezing out any zero elements. If a matrix contains many zeros, converting the matrix to sparse storage saves memory. S = sparse( m,n ) generates an m -by- n all zero sparse matrix.
To convert a DataFrame to a CSR matrix, you first need to create indices for users and movies. Then, you can perform conversion with the sparse. csr_matrix function. It is a bit faster to convert via a coordinate (COO) matrix.
A dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.
I think you're looking for the pandas.get_dummies() function and pandas.DataFrame.combineAdd method.
In [7]: df = pd.DataFrame({'col1': list('acb'),
                           'col2': list('bde'),
                           'val': [.5, .3, .2]})
In [8]: df1 = pd.get_dummies(df.col1)
In [9]: df2 = pd.get_dummies(df.col2)
This produces the following two dataframes:
In [16]: df1
Out[16]: 
   a  b  c
0  1  0  0
1  0  0  1
2  0  1  0
[3 rows x 3 columns]
In [17]: df2
Out[17]: 
   b  d  e
0  1  0  0
1  0  1  0
2  0  0  1
[3 rows x 3 columns]
Which can be combined as follows:
In [10]: dummies = df1.combineAdd(df2)
In [18]: dummies
Out[18]: 
   a  b  c  d  e
0  1  1  0  0  0
1  0  0  1  1  0
2  0  1  0  0  1
[3 rows x 5 columns]
The last step is to copy the val column into the new dataframe.
In [19]: dummies['val'] = df.val
In [20]: dummies
Out[20]: 
   a  b  c  d  e  val
0  1  1  0  0  0  0.5
1  0  0  1  1  0  0.3
2  0  1  0  0  1  0.2
[3 rows x 6 columns]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With