pandas pivot changes dtype

Tags:

pandas

After create a new dataframe with pandas pivot_table, the dtype changes from int32 to float

Original dataframe

df = pd.DataFrame.from_dict(my_dict, orient='columns', dtype='i4')
print(df.head(11))

output:

         clock   eventid         ns  objectid  value
0   1505960158  62704261  327504323     32219      1
1   1505962773  62711138   22192905     32219      0
2   1505400465  61216428  123915259     32233      1
3   1504642494  59208977  369082011     32254      1
4   1504643325  59210478  576875730     32254      0
5   1504642494  59208978  369082011     32260      1
6   1504643325  59210479  576875730     32260      0
7   1504224224  58101461  445846619     13479      0
8   1504258784  58187457  204908064     13479      1
9   1504310624  58318750  443786274     13479      0
10  1504517992  58886060  746243067     13479      1

print(df.dtypes)

output:

clock       int32
eventid     int32
ns          int32
objectid    int32
value       int32
dtype: object

Whe I use pivot_table

p = df.reset_index().pivot_table(index="objectid", columns="value", values="clock", fill_value=0).iloc[:, ::-1]
print(p)

output:

value              1             0
objectid                          
13479     1505534184  1.505467e+09
13485     1505676014  1.505677e+09
32219     1505960158  1.505963e+09
32233     1505400465  0.000000e+00
32254     1504642494  1.504643e+09
32260     1504642494  1.504643e+09
print(p.dtypes)

output:

value
1      int64
0    float64
dtype: object

Why 0 column become float? How to avoid this?

376

asked Oct 21 '17 02:10

1 Answers

Your sample data may not show it, but the results of your pivot operation possibly contain NaNs, which are of float type, so the rest of the column is also upcasted to float automatically by pandas for efficient computation. Note that the NaNs are filled by zeros (fill_value=0), so you don't get to see them.

For example, there is no row with objectid = 32233 and value = 0, so that corresponding entry in your pivot result shows up as NaN, which then gets filled with 0.

Now that it's clear why the columns are upcasted, you can reset the datatype using astype:

p = p.astype(int)

124

answered Oct 19 '22 14:10

cs95

Related questions
                            
                                How to shade a region with Altair
                            
                                Multiprocessing output differs between Linux and Windows - Why?
                            
                                Limit child collections in initial query sqlalchemy
                            
                                What happens in numpy's log function? Are there ways to improve the performance?
                            
                                Python: append an original object vs append a copy of object [duplicate]
                            
                                No module named _gdal
                            
                                Tensorflow: simultaneous prediction on GPU and CPU
                            
                                Could not import runpy module
                            
                                How to make python portable?
                            
                                Indexing and searching related objects with haystack
                            
                                Pandas: Concatenating DataFrame with Sparse Matrix
                            
                                Why using Anaconda environments to install tensorflow on Windows?
                            
                                Scatter plot label overlaps - matplotlib
                            
                                Why don't the signals emit?
                            
                                Dictionary record is truncated when printing
                            
                                LSTM Autoencoder no progress when script is running on larger dataset
                            
                                Empty crossword solver in Python
                            
                                Can TensorFlow run with multiple CPUs (no GPUs)?
                            
                                Autobahn, leave onMessage block
                            
                                Why is dependency links in setup.py deprecated?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas pivot changes dtype

Tags:

python

pandas

Joao Vitorino

People also ask

1 Answers

cs95

Recent Activity

Donate For Us