I have a huge pandas dataframe, shaped like this example:
new_id hour names values
0 0 mark 5
0 0 matt 4
0 0 alex 3
1 0 roger 2
1 0 arthur 7
1 1 alf 8
2 1 ale 6
3 1 peter 5
3 2 tom 2
4 2 andrew 7
I need to reshape it, so I use pivot_table():
dummy=dummy.pivot_table(index=['hour','new_id'],columns='name', values='values').fillna(0)
so it becomes
names ale alex alf andrew arthur mark matt peter roger tom
hour new_id
0 0 0.0 3.0 0.0 0.0 0.0 5.0 4.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 7.0 0.0 0.0 0.0 2.0 0.0
1 1 0.0 0.0 8.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 6.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 0.0
2 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0
4 0.0 0.0 0.0 7.0 0.0 0.0 0.0 0.0 0.0 0.0
....
By the way, this small example can't reproduce my problem: in the real dataset, when I pivot it, I get some float values which shouldn't be there, since they are the aggregation and the sum of the values of the initial dataset, which are all integers. Not only they are float, but they are also quite far from the exact results.
Why do I get these float values? Is there a better way to get what I want? I don't really want to write by myself a function to sum properly all the values before pivoting the dataframe, since this should be exactly what pivot_table()does.
There is problem NaNs, which convert all values to floats so possible solution is add parameter fill_value=0 if input data are integers:
dummy=dummy.pivot_table(index=['hour','new_id'],columns='name', values='values', fill_value=0)
print (dummy)
name ale alex alf andrew arthur mark matt peter roger tom
hour new_id
0 0 0 3 0 0 0 5 4 0 0 0
1 0 0 0 0 7 0 0 0 2 0
1 1 0 0 8 0 0 0 0 0 0 0
2 6 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 5 0 0
2 3 0 0 0 0 0 0 0 0 0 2
4 0 0 0 7 0 0 0 0 0 0
Default aggregate function in pivot_table is mean, so is expected at least one float value in output, so it convert all values to floats.
So if change aggregate function to sum all working nice:
dummy = dummy.pivot_table(index=['hour','new_id'],
columns='name',
values='values',
fill_value=0,
aggfunc='sum')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With