Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas.Series.isin with case insensitive

I want to filter out some rows with one of DataFrame's column which data is in a list.

df[df['column'].isin(mylist)]

But I found that it's case sensitive. Is there any method using ".isin()" with case insensitive?

like image 859
haoping Avatar asked Aug 14 '17 17:08

haoping


People also ask

How do I ignore case in STR contain?

str. contains has a case parameter that is True by default. Set it to False to do a case insensitive match.

Is Python Pandas case sensitive?

pandas. DataFrame. merge (similar to a SQL join) is case sensitive, as are most Python functions.

How do you check if a series contains a string?

contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

How do I change to lower case in Pandas?

lower() . Converts all characters to lowercase.


2 Answers

One way would be by comparing the lower or upper case of the Series with the same for the list

df[df['column'].str.lower().isin([x.lower() for x in mylist])]

The advantage here is that we are not saving any changes to the original df or the list making the operation more efficient

Consider this dummy df:

    Color   Val
0   Green   1
1   Green   1
2   Red     2
3   Red     2
4   Blue    3
5   Blue    3

For the list l:

l = ['green', 'BLUE']

You can use isin()

df[df['Color'].str.lower().isin([x.lower() for x in l])]

You get

    Color   Val
0   Green   1
1   Green   1
4   Blue    3
5   Blue    3
like image 149
Vaishali Avatar answered Sep 21 '22 14:09

Vaishali


I prefer to use the general .apply

myset = set([s.lower() for s in mylist])
df[df['column'].apply(lambda v: v.lower() in myset)]

A lookup in a set is faster than a lookup in a list

like image 40
Uri Goren Avatar answered Sep 19 '22 14:09

Uri Goren