Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas pivot changes dtype

Tags:

python

pandas

After create a new dataframe with pandas pivot_table, the dtype changes from int32 to float

Original dataframe

df = pd.DataFrame.from_dict(my_dict, orient='columns', dtype='i4')
print(df.head(11))

output:

         clock   eventid         ns  objectid  value
0   1505960158  62704261  327504323     32219      1
1   1505962773  62711138   22192905     32219      0
2   1505400465  61216428  123915259     32233      1
3   1504642494  59208977  369082011     32254      1
4   1504643325  59210478  576875730     32254      0
5   1504642494  59208978  369082011     32260      1
6   1504643325  59210479  576875730     32260      0
7   1504224224  58101461  445846619     13479      0
8   1504258784  58187457  204908064     13479      1
9   1504310624  58318750  443786274     13479      0
10  1504517992  58886060  746243067     13479      1

print(df.dtypes)

output:

clock       int32
eventid     int32
ns          int32
objectid    int32
value       int32
dtype: object

Whe I use pivot_table

p = df.reset_index().pivot_table(index="objectid", columns="value", values="clock", fill_value=0).iloc[:, ::-1]
print(p)

output:

value              1             0
objectid                          
13479     1505534184  1.505467e+09
13485     1505676014  1.505677e+09
32219     1505960158  1.505963e+09
32233     1505400465  0.000000e+00
32254     1504642494  1.504643e+09
32260     1504642494  1.504643e+09
print(p.dtypes)

output:

value
1      int64
0    float64
dtype: object

Why 0 column become float? How to avoid this?

like image 376
Joao Vitorino Avatar asked Oct 21 '17 02:10

Joao Vitorino


People also ask

What is difference between pivot and pivot table in pandas?

Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation.

What does .pivot do in Python?

The pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns.

How do I get rid of pandas indexing?

Dropping a Pandas Index Column Using reset_index The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index.

What is meant by Dtype (' O ')?

It means: 'O' (Python) objects. Source. The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised.


1 Answers

Your sample data may not show it, but the results of your pivot operation possibly contain NaNs, which are of float type, so the rest of the column is also upcasted to float automatically by pandas for efficient computation. Note that the NaNs are filled by zeros (fill_value=0), so you don't get to see them.

For example, there is no row with objectid = 32233 and value = 0, so that corresponding entry in your pivot result shows up as NaN, which then gets filled with 0.

Now that it's clear why the columns are upcasted, you can reset the datatype using astype:

p = p.astype(int)
like image 124
cs95 Avatar answered Oct 19 '22 14:10

cs95