l want to compare the values of two columns where I create a new column bin_crnn
. I want 1 if they are equals or 0 if not.
# coding: utf-8
import pandas as pd
df = pd.read_csv('file.csv',sep=',')
if df['crnn_pred']==df['manual_raw_value']:
df['bin_crnn']=1
else:
df['bin_crnn']=0
l got the following error
if df['crnn_pred']==df['manual_raw_value']:
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/pandas/core/generic.py", line 917, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.
Create a new column by assigning the output to the DataFrame with a new column name in between the [] . Operations are element-wise, no need to loop over rows. Use rename with a dictionary or function to rename row labels or column names.
Update column based on another column using CASE statement We use a CASE statement to specify new value of first_name column for each value of id column. This is a much better approach than using WHERE clause because with WHERE clause we can only change a column value to one new value.
You need cast boolean mask to int
with astype
:
df['bin_crnn'] = (df['crnn_pred']==df['manual_raw_value']).astype(int)
Sample:
df = pd.DataFrame({'crnn_pred':[1,2,5], 'manual_raw_value':[1,8,5]})
print (df)
crnn_pred manual_raw_value
0 1 1
1 2 8
2 5 5
print (df['crnn_pred']==df['manual_raw_value'])
0 True
1 False
2 True
dtype: bool
df['bin_crnn'] = (df['crnn_pred']==df['manual_raw_value']).astype(int)
print (df)
crnn_pred manual_raw_value bin_crnn
0 1 1 1
1 2 8 0
2 5 5 1
You get error, because if compare columns output is not scalar, but Series
(array
) of True
and False
values.
So need all
or
any
for return scalar True
or False
.
I think better it explain this answer.
One fast approach is to use np.where.
import numpy as np
df['test'] = np.where(df['crnn_pred']==df['manual_raw_value'], 1, 0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With