I have a dataframe that looks like this:
a b c
0 1 10
1 2 10
2 2 20
3 3 30
4 1 40
4 3 10
The dataframe above as default (0,1,2,3,4...) indices. I would like to convert it into a dataframe that looks like this:
1 2 3
0 10 0 0
1 0 10 0
2 0 20 0
3 0 0 30
4 40 0 10
Where column 'a' in the first dataframe becomes the index in the second dataframe, the values of 'b' become the column names and the values of c are copied over, with 0 or NaN filling missing values. The original dataset is large and will result in a very sparse second dataframe. I then intend to add this dataframe to a much larger one, which is straightforward.
Can anyone advise the best way to achieve this please?
You can use the pivot
method for this.
See the docs: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-pivoting-dataframe-objects
An example:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a':[0,1,2,3,4,4], 'b':[1,2,2,3,1,3], 'c':[10,10,20,3
0,40,10]})
In [3]: df
Out[3]:
a b c
0 0 1 10
1 1 2 10
2 2 2 20
3 3 3 30
4 4 1 40
5 4 3 10
In [4]: df.pivot(index='a', columns='b', values='c')
Out[4]:
b 1 2 3
a
0 10 NaN NaN
1 NaN 10 NaN
2 NaN 20 NaN
3 NaN NaN 30
4 40 NaN 10
If you want zeros instead of NaN's as in your example, you can use fillna
:
In [5]: df.pivot(index='a', columns='b', values='c').fillna(0)
Out[5]:
b 1 2 3
a
0 10 0 0
1 0 10 0
2 0 20 0
3 0 0 30
4 40 0 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With