Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I filter a substring from a pandas dataframe based on a list? [duplicate]

I know this is a question that has been asked times and again but I'm not very good with list comprehensions and there a small twist to my code.

I have a dataframe containing keywords, I'd like to filter them if the keywords contain one or more keywords from a dedicated list.

Please note that I'm not looking for the exact expression, just the occurrence of a substring in the dataframe.

Basically I think it should look something like this :

substring_list = ['abc', 'def']
df[df['tag'].str.contains(substring) for substring in substring_list]

I keep getting syntax errors.

Any ideas ?

Thanks for the support !

like image 340
sovnheim Avatar asked Jan 28 '23 03:01

sovnheim


2 Answers

Use:

df['tag'].str.contains('|'.join(substring_list))
like image 83
Franco Piccolo Avatar answered Feb 17 '23 08:02

Franco Piccolo


Simply try this:

Use pattern base search by constructing the regex by joining the words in pattern with | as follows:

df[df.tag.str.contains('|'.join(substring_list))]

In case you have only few strings to search then simple can use like below:

df[df.tag.str.contains("abc|def")]

Example illustration:

>>> df
   tag
0  abc
1  edf
2  abc
3  def
4  efg

>>> df[df.tag.str.contains("abc|def")]
   tag
0  abc
2  abc
3  def

>>> substring_list = ['abc', 'def']


>>> df[df.tag.str.contains('|'.join(substring_list))]
   tag
0  abc
2  abc
3  def
like image 23
Karn Kumar Avatar answered Feb 17 '23 08:02

Karn Kumar