Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace whole string if it contains substring in pandas

Tags:

python

pandas

I want to replace all strings that contain a specific substring. So for example if I have this dataframe:

import pandas as pd df = pd.DataFrame({'name': ['Bob', 'Jane', 'Alice'],                     'sport': ['tennis', 'football', 'basketball']}) 

I could replace football with the string 'ball sport' like this:

df.replace({'sport': {'football': 'ball sport'}}) 

What I want though is to replace everything that contains ball (in this case football and basketball) with 'ball sport'. Something like this:

df.replace({'sport': {'[strings that contain ball]': 'ball sport'}}) 
like image 766
nicofilliol Avatar asked Sep 29 '16 11:09

nicofilliol


People also ask

How do I replace a specific string in pandas?

You can replace substring of pandas DataFrame column by using DataFrame. replace() method. This method by default finds the exact sting match and replaces it with the specified value. Use regex=True to replace substring.

How do you replace all instances of substring in Python?

The easiest way to replace all occurrences of a given substring in a string is to use the replace() function.

How do I check if a string contains a substring panda?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not. A basic application of contains should look like Series. str. contains("substring") .

How replace multiple strings in pandas?

Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column.


2 Answers

You can use str.contains to mask the rows that contain 'ball' and then overwrite with the new value:

In [71]: df.loc[df['sport'].str.contains('ball'), 'sport'] = 'ball sport' df  Out[71]:     name       sport 0    Bob      tennis 1   Jane  ball sport 2  Alice  ball sport 

To make it case-insensitive pass `case=False:

df.loc[df['sport'].str.contains('ball', case=False), 'sport'] = 'ball sport' 
like image 174
EdChum Avatar answered Oct 06 '22 13:10

EdChum


You can use apply with a lambda. The x parameter of the lambda function will be each value in the 'sport' column:

df.sport = df.sport.apply(lambda x: 'ball sport' if 'ball' in x else x) 
like image 39
DeepSpace Avatar answered Oct 06 '22 13:10

DeepSpace