Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a new column depending on the equality of two other columns

l want to compare the values of two columns where I create a new column bin_crnn. I want 1 if they are equals or 0 if not.

# coding: utf-8
import pandas as pd

df = pd.read_csv('file.csv',sep=',')

if df['crnn_pred']==df['manual_raw_value']:
    df['bin_crnn']=1
else:
    df['bin_crnn']=0

l got the following error

    if df['crnn_pred']==df['manual_raw_value']:
  File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/pandas/core/generic.py", line 917, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
like image 238
vincent75 Avatar asked May 19 '17 10:05

vincent75


People also ask

How do you add a new column to a DataFrame based on another column?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.

How will you create a new column whose value is calculated from two other columns?

Create a new column by assigning the output to the DataFrame with a new column name in between the [] . Operations are element-wise, no need to loop over rows. Use rename with a dictionary or function to rename row labels or column names.

How do I change the value of a column based on another column?

Update column based on another column using CASE statement We use a CASE statement to specify new value of first_name column for each value of id column. This is a much better approach than using WHERE clause because with WHERE clause we can only change a column value to one new value.


2 Answers

You need cast boolean mask to int with astype:

df['bin_crnn'] = (df['crnn_pred']==df['manual_raw_value']).astype(int)

Sample:

df = pd.DataFrame({'crnn_pred':[1,2,5], 'manual_raw_value':[1,8,5]})
print (df)
   crnn_pred  manual_raw_value
0          1                 1
1          2                 8
2          5                 5

print (df['crnn_pred']==df['manual_raw_value'])
0     True
1    False
2     True
dtype: bool

df['bin_crnn'] = (df['crnn_pred']==df['manual_raw_value']).astype(int)
print (df)
   crnn_pred  manual_raw_value  bin_crnn
0          1                 1         1
1          2                 8         0
2          5                 5         1

You get error, because if compare columns output is not scalar, but Series (array) of True and False values.

So need all or any for return scalar True or False.

I think better it explain this answer.

like image 189
jezrael Avatar answered Sep 28 '22 08:09

jezrael


One fast approach is to use np.where.

import numpy as np
df['test'] = np.where(df['crnn_pred']==df['manual_raw_value'], 1, 0)
like image 43
Allen Avatar answered Sep 28 '22 08:09

Allen