Suppose I have a dataframe,
data
id  URL
1   www.pandora.com
2   m.jcpenney.com
3   www.youtube.com
4   www.facebook.com
I want to create a new column based on a condition that if the URL contains some particular word. Suppose if it contains 'youtube', I want my column value as youtube. So I tried the following,
data['test'] = 'other'
so once we do that we have,
data['test']
other
other
other
other
then I tried this,
data[data['URL'].str.contains("youtub") == True]['test'] = 'Youtube'
data[data['URL'].str.contains("face") == True]['test'] = 'Facebook'
Though this runs without any error, the value of the test column, doesn't change. It still has other only for all the columns. When I run this statement, ideally 3rd row alone show change to 'Youtube' and 4th to 'Facebook'. But it doesn't change. Can anybody tell me what mistake I am doing here?
I think you can use loc with boolean mask created by contains:
print data['URL'].str.contains("youtub")
0    False
1    False
2     True
3    False
Name: URL, dtype: bool
data.loc[data['URL'].str.contains("youtub"),'test'] = 'Youtube'
data.loc[data['URL'].str.contains("face"),'test'] = 'Facebook'
print data
   id               URL      test
0   1   www.pandora.com       NaN
1   2    m.jcpenney.com       NaN
2   3   www.youtube.com   Youtube
3   4  www.facebook.com  Facebook
                        i would do it in one shot:
replacements = {
  r'.*youtube.*': 'Youtube',
  r'.*face.*': 'Facebook',
  r'.*pandora.*': 'Pandora'
}
df['text'] = df.URL.replace(replacements, regex=True)
df.loc[df.text.str.contains('\.'), 'text'] = 'other'
print(df)
Output:
                 URL      text
id
1    www.pandora.com   Pandora
2     m.jcpenney.com     other
3    www.youtube.com   Youtube
4   www.facebook.com  Facebook
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With