Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using fillna, downcast and pandas

I've searched for something to help me understand the keyword argument downcast in the class method DataFrame.fillna. Please provide an example to help facilitate my and everyone's learning: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html

Also if you can say a word or two about type setting on a column by column basis with NaN or even NoneType values in the column and how to handle such common stuff. And what the difference between those two are.

Thank you very much!

like image 959
user3659451 Avatar asked Nov 21 '14 16:11

user3659451


People also ask

How do I use Fillna in pandas?

Pandas DataFrame fillna() MethodThe fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.

Does Fillna work with NaN?

You can use the fillna() function to replace NaN values in a pandas DataFrame.


1 Answers

Despite what doc says:

downcast : dict, default is None

a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)

if you supply dict as downcast you'll get AssertionError("dtypes as dict is not supported yet")

One can use only downcast='infer' which cause pandas to try to downcast for example floats to integers. But this seems to be buggy: If all floats in column are over 10000 it loses precision and converts them to integers.

In [1]: import pandas as pd
   ...: import numpy as np
   ...: df = pd.DataFrame([[3.14,9999.9,10000.1,200000.2],[2.72,9999.9,10000.1,300000.3]], columns=list("ABCD"))
   ...: df.dtypes
   ...: 
Out[1]: 
A    float64
B    float64
C    float64
D    float64
dtype: object

In [2]: df
Out[2]: 
      A       B        C         D
0  3.14  9999.9  10000.1  200000.2
1  2.72  9999.9  10000.1  300000.3

In [3]: dff=df.fillna(0, downcast='infer')
   ...: dff.dtypes
   ...: 
Out[3]: 
A    float64
B    float64
C      int64
D      int64
dtype: object

In [4]: dff
Out[4]: 
      A       B      C       D
0  3.14  9999.9  10000  200000
1  2.72  9999.9  10000  300000
like image 89
tworec Avatar answered Nov 15 '22 10:11

tworec