Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python max() function fails when used on pandas columns and integer fails

I'm trying to create a new pandas dataframe column by subtracting an existing dataframe column column from another. However, if the result is a negative number, the new column value should be set to zero.

import pandas as pd
data = {'A': [1,2,3], 'B': [3,2,1]}
df = pd.DataFrame(data)

In [4]: df
Out[4]: 
   A  B
0  1  3
1  2  2
2  3  1

If I create a new dataframe column 'C' by subtracting 'B' from 'A', I get the right result.

df['C'] = df['A'] - df['B']

In[8]: df
Out[7]: 
   A  B  C
0  1  3 -2
1  2  2  0
2  3  1  2

However, if I utilize the max() function to avoid results with a negative number, I get "ValueError: The truth value of a Series is ambiguous."

>>> df['C'] = max(df['A'] - df['B'], 0)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The expected output is:

   A  B  C
0  1  3  0
1  2  2  0
2  3  1  2

What am I doing wrong?

like image 625
vlmercado Avatar asked Dec 18 '22 21:12

vlmercado


1 Answers

You need to use np.maximum to do element-wise maximum comparison:

>>> np.maximum(df['A'] - df['B'], 0)
0    0
1    0
2    2
dtype: int64

The problem is max is that it essentially checks (df['A'] - df['B']) > 0. This returns an array of boolean values (not a boolean), hence the error.

like image 100
Alex Riley Avatar answered Dec 21 '22 09:12

Alex Riley