After create a new dataframe with pandas pivot_table, the dtype changes from int32 to float
Original dataframe
df = pd.DataFrame.from_dict(my_dict, orient='columns', dtype='i4')
print(df.head(11))
output:
clock eventid ns objectid value
0 1505960158 62704261 327504323 32219 1
1 1505962773 62711138 22192905 32219 0
2 1505400465 61216428 123915259 32233 1
3 1504642494 59208977 369082011 32254 1
4 1504643325 59210478 576875730 32254 0
5 1504642494 59208978 369082011 32260 1
6 1504643325 59210479 576875730 32260 0
7 1504224224 58101461 445846619 13479 0
8 1504258784 58187457 204908064 13479 1
9 1504310624 58318750 443786274 13479 0
10 1504517992 58886060 746243067 13479 1
print(df.dtypes)
output:
clock int32
eventid int32
ns int32
objectid int32
value int32
dtype: object
Whe I use pivot_table
p = df.reset_index().pivot_table(index="objectid", columns="value", values="clock", fill_value=0).iloc[:, ::-1]
print(p)
output:
value 1 0
objectid
13479 1505534184 1.505467e+09
13485 1505676014 1.505677e+09
32219 1505960158 1.505963e+09
32233 1505400465 0.000000e+00
32254 1504642494 1.504643e+09
32260 1504642494 1.504643e+09
print(p.dtypes)
output:
value
1 int64
0 float64
dtype: object
Why 0 column become float? How to avoid this?
Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation.
The pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns.
Dropping a Pandas Index Column Using reset_index The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index.
It means: 'O' (Python) objects. Source. The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised.
Your sample data may not show it, but the results of your pivot operation possibly contain NaN
s, which are of float
type, so the rest of the column is also upcasted to float
automatically by pandas for efficient computation. Note that the NaN
s are filled by zeros (fill_value=0
), so you don't get to see them.
For example, there is no row with objectid = 32233
and value = 0
, so that corresponding entry in your pivot result shows up as NaN
, which then gets filled with 0
.
Now that it's clear why the columns are upcasted, you can reset the datatype using astype
:
p = p.astype(int)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With