Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas min() not picking up minimum

Tags:

python

pandas

I've encountered a strange problem. I'm sure there is a logical reason behind this.

I have a dataframe called alloptions that has 4 columns, minage1, minage2, minage3, and minage4, which are all float64. the number of missing values increases from minage1 to minage4.

I create a fifth column that takes the minimum of these four columns:

alloptions['minage']=alloptions.apply(lambda x: min([x['minage1'],x['minage2'],x['minage3'],x['minage4']]),axis=1)

which looked like it worked until i discovered that in row 47

     minage1    minage2 minage3 minage4 minage      
47     NaN      56.0    NaN      NaN     NaN

using .loc, I isolate that row:

In [10]:

 print alloptions.loc[47,:]
 print alloptions.loc[47,:].dtypes

I get

minage1   NaN
minage2    56
minage3   NaN
minage4   NaN
minage    NaN
Name: 47, dtype: float64
float64

so I'm confused as to why the function didn't pick up 56.

Thank you in advance for your help.

like image 744
chungkim271 Avatar asked May 05 '15 19:05

chungkim271


1 Answers

You are using the builtin Python min function, which doesn't know about nan and treats it inconsistently:

>>> min(1, np.nan)
1
>>> min(np.nan, 1)
nan

Instead, use the min method from pandas, which knows to ignore nan values when computing the min. This method takes an axis argument, so if your four minageX columns are the only columns in your DataFrame, you can just do

df['minage'] = df.min(axis=1)

In general when working with pandas data structures you should avoid using builtin Python functions like max, min, sum, and so on, and instead use the pandas versions; the builtin functions do not know anything about pandas or about vectorized operations, and may give unexpected results.

like image 105
BrenBarn Avatar answered Nov 14 '22 22:11

BrenBarn