Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional If Statement: If value in row contains string ... set another column equal to string

EDIT MADE:

I have the 'Activity' column filled with strings and I want to derive the values in the 'Activity_2' column using an if statement.

So Activity_2 shows the desired result. Essentially I want to call out what type of activity is occurring.

I tried to do this using my code below but it won't run (please see screen shot below for error). Any help is greatly appreciated!

enter image description here

    for i in df2['Activity']:
        if i contains 'email':
            df2['Activity_2'] = 'email'
        elif i contains 'conference'
            df2['Activity_2'] = 'conference'
        elif i contains 'call'
            df2['Activity_2'] = 'call'
        else:
            df2['Activity_2'] = 'task'


Error: if i contains 'email':
                ^
SyntaxError: invalid syntax
like image 808
PineNuts0 Avatar asked May 11 '17 03:05

PineNuts0


People also ask

How do you see if a column contains a string in pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.

How do I check if a string contains a substring in Python?

The in Operator It returns a Boolean (either True or False ). To check if a string contains a substring in Python using the in operator, we simply invoke it on the superstring: fullstring = "StackAbuse" substring = "tack" if substring in fullstring: print("Found!") else: print("Not found!")


3 Answers

I assume you are using pandas, then you can use numpy.where, which is a vectorized version of if/else, with the condition constructed by str.contains:

df['Activity_2'] = pd.np.where(df.Activity.str.contains("email"), "email",
                   pd.np.where(df.Activity.str.contains("conference"), "conference",
                   pd.np.where(df.Activity.str.contains("call"), "call", "task")))

df

#   Activity            Activity_2
#0  email personA       email
#1  attend conference   conference
#2  send email          email
#3  call Sam            call
#4  random text         task
#5  random text         task
#6  lwantto call        call
like image 155
Psidom Avatar answered Oct 25 '22 05:10

Psidom


This also works:

df.loc[df['Activity'].str.contains('email'), 'Activity_2'] = 'email'
df.loc[df['Activity'].str.contains('conference'), 'Activity_2'] = 'conference'
df.loc[df['Activity'].str.contains('call'), 'Activity_2'] = 'call'
like image 12
moshfiqur Avatar answered Oct 25 '22 04:10

moshfiqur


The current solution behaves wrongly if your df contains NaN values. In that case I recommend using the following code which worked for me

temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
                   pd.np.where(temp.str.contains("email"), "email",
                   pd.np.where(temp.str.contains("conference"), "conference",
                   pd.np.where(temp.str.contains("call"), "call", "task"))))
like image 8
DovaX Avatar answered Oct 25 '22 06:10

DovaX