Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

if else function in pandas dataframe [duplicate]

I'm trying to apply an if condition over a dataframe, but I'm missing something (error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().)

raw_data = {'age1': [23,45,21],'age2': [10,20,50]}
df = pd.DataFrame(raw_data, columns = ['age1','age2'])

def my_fun (var1,var2,var3):
if (df[var1]-df[var2])>0 :
    df[var3]=df[var1]-df[var2]
else:
    df[var3]=0
print(df[var3])

my_fun('age1','age2','diff')
like image 484
progster Avatar asked Apr 13 '17 11:04

progster


People also ask

How do you check if there are duplicates in pandas DataFrame?

The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.

How do I avoid duplicates in pandas?

Remove All Duplicate Rows from Pandas DataFrame You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .

How do you get duplicates in pandas?

By using 'last', the last occurrence of each set of duplicated values is set on False and all others on True. By setting keep on False, all duplicates are True. To find duplicates on specific column(s), use subset .


2 Answers

You can use numpy.where:

def my_fun (var1,var2,var3):
    df[var3]= np.where((df[var1]-df[var2])>0, df[var1]-df[var2], 0)
    return df

df1 = my_fun('age1','age2','diff')
print (df1)
   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

Error is better explain here.

Slowier solution with apply, where need axis=1 for data processing by rows:

def my_fun(x, var1, var2, var3):
    print (x)
    if (x[var1]-x[var2])>0 :
        x[var3]=x[var1]-x[var2]
    else:
        x[var3]=0
    return x    

print (df.apply(lambda x: my_fun(x, 'age1', 'age2','diff'), axis=1))
   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

Also is possible use loc, but sometimes data can be overwritten:

def my_fun(x, var1, var2, var3):
    print (x)
    mask = (x[var1]-x[var2])>0
    x.loc[mask, var3] = x[var1]-x[var2]
    x.loc[~mask, var3] = 0

    return x    

print (my_fun(df, 'age1', 'age2','diff'))
   age1  age2  diff
0    23    10  13.0
1    45    20  25.0
2    21    50   0.0
like image 56
jezrael Avatar answered Sep 21 '22 15:09

jezrael


You can use pandas.Series.where

df.assign(age3=(df.age1 - df.age2).where(df.age1 > df.age2, 0))

   age1  age2  age3
0    23    10    13
1    45    20    25
2    21    50     0

You can wrap this in a function

def my_fun(v1, v2):
    return v1.sub(v2).where(v1 > v2, 0)

df.assign(age3=my_fun(df.age1, df.age2))

   age1  age2  age3
0    23    10    13
1    45    20    25
2    21    50     0
like image 22
piRSquared Avatar answered Sep 19 '22 15:09

piRSquared