Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I combine the AND and OR operator in a pandas data frame?

My goal is to find out if certain combinations of keywords that could be present in a column filled with text strings (titles of news articles). I then want to plot the frequency in a bar chart.

I have done the following, using a pandas data frame:

pvv_news = df[df['desc'].str.contains("pvv", case=True)]
pvv_month = win.groupby(win.index.month).size()
pvv_month.index = ['January', 'February', 'March', 'April', 'May', 'June']
pvv_month.plot(kind='bar')

Which gives:

enter image description here

Now, what I can't figure out is how to make a combinations of AND and OR to get more specific results. Example of what I have in mind but what doesn't work:

pvv_news = df[df['desc'].str.contains("(pvv)&(nederland|overheid)", case=True)]

I've looked at the following functions but I can't figure it out:

  • pandas.Series.str.extract
  • pandas.Series.str.match
  • pandas.Series.str.contains
  • Regular expressions in combination with the above functions.
like image 201
Lam Avatar asked Aug 18 '15 11:08

Lam


Video Answer


1 Answers

If I'm following what you want to do, this should work:

pvv_news = df[(df['desc'].str.contains("pvv"), case = True) &
              ((df['desc'].str.contains("nederland"), case = True) |  
               (df['desc'].str.contains("overheid"), case = True)) ]
like image 146
iayork Avatar answered Sep 18 '22 12:09

iayork