python pandas min() not picking up minimum

Question

I've encountered a strange problem. I'm sure there is a logical reason behind this.

I have a dataframe called alloptions that has 4 columns, minage1, minage2, minage3, and minage4, which are all float64. the number of missing values increases from minage1 to minage4.

I create a fifth column that takes the minimum of these four columns:

alloptions['minage']=alloptions.apply(lambda x: min([x['minage1'],x['minage2'],x['minage3'],x['minage4']]),axis=1)

which looked like it worked until i discovered that in row 47

     minage1    minage2 minage3 minage4 minage      
47     NaN      56.0    NaN      NaN     NaN

using .loc, I isolate that row:

In [10]:

 print alloptions.loc[47,:]
 print alloptions.loc[47,:].dtypes

I get

minage1   NaN
minage2    56
minage3   NaN
minage4   NaN
minage    NaN
Name: 47, dtype: float64
float64

so I'm confused as to why the function didn't pick up 56.

Thank you in advance for your help.

BrenBarn · Accepted Answer

You are using the builtin Python min function, which doesn't know about nan and treats it inconsistently:

>>> min(1, np.nan)
1
>>> min(np.nan, 1)
nan

Instead, use the min method from pandas, which knows to ignore nan values when computing the min. This method takes an axis argument, so if your four minageX columns are the only columns in your DataFrame, you can just do

df['minage'] = df.min(axis=1)

In general when working with pandas data structures you should avoid using builtin Python functions like max, min, sum, and so on, and instead use the pandas versions; the builtin functions do not know anything about pandas or about vectorized operations, and may give unexpected results.

python pandas min() not picking up minimum

Tags:

python

pandas

chungkim271

1 Answers

BrenBarn

Recent Activity

Donate For Us

python pandas min() not picking up minimum

Tags:

python

pandas

chungkim271

1 Answers

BrenBarn

Related questions

Recent Activity

Donate For Us