Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter dataframe by a list of possible prefixes for specific column

What I'm trying to do is:

options = ['abc', 'def']
df[any(df['a'].str.startswith(start) for start in options)]

I want to apply a filter so I only have entries that have values in the column 'a' starting with one of the given options.

the next code works, but I need it to work with several options of prefixes...

start = 'abc'
df[df['a'].str.startswith(start)]

The error message is

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Read Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() but haven't got understanding of how to do so.

like image 820
Tatiana Didik Avatar asked Mar 06 '23 13:03

Tatiana Didik


1 Answers

You can pass a tuple of options to startswith

df = pd.DataFrame({'a': ['abcd', 'def5', 'xabc', '5abc1', '9def', 'defabcb']})
options = ['abc', 'def']
df[df.a.str.startswith(tuple(options))]

You get

    a
0   abcd
1   def5
5   defabcb
like image 92
Vaishali Avatar answered Apr 08 '23 06:04

Vaishali