I have a huge pandas dataframe, shaped like this example:
new_id hour names values
0 0 mark 5
0 0 matt 4
0 0 alex 3
1 0 roger 2
1 0 arthur 7
1 1 alf 8
2 1 ale 6
3 1 peter 5
3 2 tom 2
4 2 andrew 7
I need to reshape it, so I use pivot_table()
:
dummy=dummy.pivot_table(index=['hour','new_id'],columns='name', values='values').fillna(0)
so it becomes
names ale alex alf andrew arthur mark matt peter roger tom
hour new_id
0 0 0.0 3.0 0.0 0.0 0.0 5.0 4.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 7.0 0.0 0.0 0.0 2.0 0.0
1 1 0.0 0.0 8.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 6.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 0.0
2 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0
4 0.0 0.0 0.0 7.0 0.0 0.0 0.0 0.0 0.0 0.0
....
By the way, this small example can't reproduce my problem: in the real dataset, when I pivot it, I get some float values which shouldn't be there, since they are the aggregation and the sum of the values
of the initial dataset, which are all integers. Not only they are float, but they are also quite far from the exact results.
Why do I get these float values? Is there a better way to get what I want? I don't really want to write by myself a function to sum properly all the values before pivoting the dataframe, since this should be exactly what pivot_table()
does.
There is problem NaN
s, which convert all values to float
s so possible solution is add parameter fill_value=0
if input data are integer
s:
dummy=dummy.pivot_table(index=['hour','new_id'],columns='name', values='values', fill_value=0)
print (dummy)
name ale alex alf andrew arthur mark matt peter roger tom
hour new_id
0 0 0 3 0 0 0 5 4 0 0 0
1 0 0 0 0 7 0 0 0 2 0
1 1 0 0 8 0 0 0 0 0 0 0
2 6 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 5 0 0
2 3 0 0 0 0 0 0 0 0 0 2
4 0 0 0 7 0 0 0 0 0 0
Default aggregate function in pivot_table
is mean
, so is expected at least one float value in output, so it convert all values to float
s.
So if change aggregate function to sum
all working nice:
dummy = dummy.pivot_table(index=['hour','new_id'],
columns='name',
values='values',
fill_value=0,
aggfunc='sum')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With