Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe column value case insensitive replace where <condition>

Is there a case insensitive version for pandas.DataFrame.replace? https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.replace.html

I need to replace string values in a column subject to a case-insensitive condition of the form "where label == a or label == b or label == c".

like image 776
Prateek Dewan Avatar asked Dec 07 '17 09:12

Prateek Dewan


People also ask

How do you replace values in a DataFrame column based on condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do I replace a specific value in a column in Pandas?

replace() function is used to replace values in column (one value with another value on all columns). This method takes to_replace, value, inplace, limit, regex and method as parameters and returns a new DataFrame. When inplace=True is used, it replaces on existing DataFrame object and returns None value.

Is Pandas column case sensitive?

pandas. DataFrame. merge (similar to a SQL join) is case sensitive, as are most Python functions.

How do I replace missing values in a column in Pandas?

The method argument of fillna() can be used to replace missing values with previous/next valid values. If method is set to 'ffill' or 'pad' , missing values are replaced with previous valid values (= forward fill), and if 'bfill' or 'backfill' , replaced with the next valid values (= backward fill).


1 Answers

The issue with some of the other answers is that they don't work with all Dataframes, only with Series, or Dataframes that can be implicitly converted to a Series. I understand this is because the .str construct exists in the Series class, but not in the Dataframe class.

To work with Dataframes, you can make your regular expression case insensitive with the (?i) extension. I don't believe this is available in all flavors of RegEx but it works with Pandas.

d = {'a':['test', 'Test', 'cat'], 'b':['CAT', 'dog', 'Cat']}
df = pd.DataFrame(data=d)

    a       b
0   test    CAT
1   Test    dog
2   cat     Cat

Then use replace as you normally would but with the (?i) extension:

df.replace('(?i)cat', 'MONKEY', regex=True)

    a       b
0   test    MONKEY
1   Test    dog
2   MONKEY  MONKEY
like image 137
geekly Avatar answered Nov 15 '22 11:11

geekly