I'm working on an iPython project with Pandas and Numpy. I'm just learning too so this question is probably pretty basic. Lets say I have two columns of data
---------------
| col1 | col2 |
---------------
| a | b |
| c | d |
| b | e |
---------------
I want to transform this data of the form.
---------------------
| a | b | c | d | e |
---------------------
| 1 | 1 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 0 | 1 |
---------------------
Then I want to take a three column version
---------------------
| col1 | col2 | val |
---------------------
| a | b | .5 |
| c | d | .3 |
| b | e | .2 |
---------------------
and turn it into
---------------------------
| a | b | c | d | e | val |
---------------------------
| 1 | 1 | 0 | 0 | 0 | .5 |
| 0 | 0 | 1 | 1 | 0 | .3 |
| 0 | 1 | 0 | 0 | 1 | .2 |
---------------------------
I'm very new to Pandas and Numpy, how would I do this? What functions would I use?
S = sparse( A ) converts a full matrix into sparse form by squeezing out any zero elements. If a matrix contains many zeros, converting the matrix to sparse storage saves memory. S = sparse( m,n ) generates an m -by- n all zero sparse matrix.
To convert a DataFrame to a CSR matrix, you first need to create indices for users and movies. Then, you can perform conversion with the sparse. csr_matrix function. It is a bit faster to convert via a coordinate (COO) matrix.
A dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.
I think you're looking for the pandas.get_dummies()
function and pandas.DataFrame.combineAdd
method.
In [7]: df = pd.DataFrame({'col1': list('acb'),
'col2': list('bde'),
'val': [.5, .3, .2]})
In [8]: df1 = pd.get_dummies(df.col1)
In [9]: df2 = pd.get_dummies(df.col2)
This produces the following two dataframes:
In [16]: df1
Out[16]:
a b c
0 1 0 0
1 0 0 1
2 0 1 0
[3 rows x 3 columns]
In [17]: df2
Out[17]:
b d e
0 1 0 0
1 0 1 0
2 0 0 1
[3 rows x 3 columns]
Which can be combined as follows:
In [10]: dummies = df1.combineAdd(df2)
In [18]: dummies
Out[18]:
a b c d e
0 1 1 0 0 0
1 0 0 1 1 0
2 0 1 0 0 1
[3 rows x 5 columns]
The last step is to copy the val
column into the new dataframe.
In [19]: dummies['val'] = df.val
In [20]: dummies
Out[20]:
a b c d e val
0 1 1 0 0 0 0.5
1 0 0 1 1 0 0.3
2 0 1 0 0 1 0.2
[3 rows x 6 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With