pandas pivot changes dtype




After create a new dataframe with pandas pivot_table, the dtype changes from int32 to float

Original dataframe

df = pd.DataFrame.from_dict(my_dict, orient='columns', dtype='i4')


         clock   eventid         ns  objectid  value
0   1505960158  62704261  327504323     32219      1
1   1505962773  62711138   22192905     32219      0
2   1505400465  61216428  123915259     32233      1
3   1504642494  59208977  369082011     32254      1
4   1504643325  59210478  576875730     32254      0
5   1504642494  59208978  369082011     32260      1
6   1504643325  59210479  576875730     32260      0
7   1504224224  58101461  445846619     13479      0
8   1504258784  58187457  204908064     13479      1
9   1504310624  58318750  443786274     13479      0
10  1504517992  58886060  746243067     13479      1



clock       int32
eventid     int32
ns          int32
objectid    int32
value       int32
dtype: object

Whe I use pivot_table

p = df.reset_index().pivot_table(index="objectid", columns="value", values="clock", fill_value=0).iloc[:, ::-1]


value              1             0
13479     1505534184  1.505467e+09
13485     1505676014  1.505677e+09
32219     1505960158  1.505963e+09
32233     1505400465  0.000000e+00
32254     1504642494  1.504643e+09
32260     1504642494  1.504643e+09


1      int64
0    float64
dtype: object

Why 0 column become float? How to avoid this?

1 Answers

Your sample data may not show it, but the results of your pivot operation possibly contain NaNs, which are of float type, so the rest of the column is also upcasted to float automatically by pandas for efficient computation. Note that the NaNs are filled by zeros (fill_value=0), so you don't get to see them.

For example, there is no row with objectid = 32233 and value = 0, so that corresponding entry in your pivot result shows up as NaN, which then gets filled with 0.

Now that it's clear why the columns are upcasted, you can reset the datatype using astype:

p = p.astype(int)
