I have a calculated column in a Pandas DataFrame which needs to be assigned base upon a condition. For example:
if(data['column_a'] == 0):
data['column_c'] = 0
else:
data['column_c'] = data['column_b']
However, that returns an error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have a feeling this has something to do with the fact that is must be done in a matrix style. Changing the code to a ternary statement doesn't work either:
data['column_c'] = 0 if data['column_a'] == 0 else data['column_b']
Anyone know the proper way to achieve this? Using apply with a lambda? I could iterate via a loop, but I'd rather keep this the preferred Pandas way.
You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.
You can do:
data['column_c'] = data['column_a'].where(data['column_a'] == 0, data['column_b'])
this is vectorised your attempts failed because the comparison with if
doesn't understand how to treat an array of boolean values hence the error
Example:
In [81]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df
Out[81]:
a b c
0 -1.065074 -1.294718 0.165750
1 -0.041167 0.962203 0.741852
2 0.714889 0.056171 1.197534
3 0.741988 0.836636 -0.660314
4 0.074554 -1.246847 0.183654
In [82]:
df['d'] = df['b'].where(df['b'] < 0, df['c'])
df
Out[82]:
a b c d
0 -1.065074 -1.294718 0.165750 -1.294718
1 -0.041167 0.962203 0.741852 0.741852
2 0.714889 0.056171 1.197534 1.197534
3 0.741988 0.836636 -0.660314 -0.660314
4 0.074554 -1.246847 0.183654 -1.246847
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With