Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New column in Pandas dataframe based on boolean conditions

I'd like to create a new column to a Pandas dataframe populated with True or False based on the other values in each specific row. My approach to solve this task was to apply a function checking boolean conditions across each row in the dataframe and populate the new column with either True or False.

This is the dataframe:

l={'DayTime':['2018-03-01','2018-03-02','2018-03-03'],'Pressure':
[9,10.5,10.5], 'Feed':[9,10.5,11], 'Temp':[9,10.5,11]}

df1=pd.DataFrame(l)

This is the function I wrote:

def ops_on(row):
   return row[('Feed' > 10)
              & ('Pressure' > 10)
              & ('Temp' > 10)
             ]

The function ops_on is used to create the new column ['ops_on']:

df1['ops_on'] = df1.apply(ops_on, axis='columns')

Unfortunately, I get this error message:

TypeError: ("'>' not supported between instances of 'str' and 'int'", 'occurred at index 0')

Thankful for help.

like image 749
TvdM Avatar asked Mar 22 '18 15:03

TvdM


2 Answers

You should work column-wise (vectorised, efficient) rather than row-wise (inefficient, Python loop):

df1['ops_on'] = (df1['Feed'] > 10) & (df1['Pressure'] > 10) & (df1['Temp'] > 10)

The & ("and") operator is applied to Boolean series element-wise. An arbitrary number of such conditions can be chained.


Alternatively, for the special case where you are performing the same comparison multiple times:

df1['ops_on'] = df1[['Feed', 'Pressure', 'Temp']].gt(10).all(1)
like image 122
jpp Avatar answered Oct 20 '22 15:10

jpp


In your current setup, just re-write your function like this:

def ops_on(row):
    return (row['Feed'] > 10) & (row['Pressure'] > 10) & (row['Temp'] > 10)
like image 1
YOLO Avatar answered Oct 20 '22 15:10

YOLO