Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing more than one substring value with pandas str.replace

I'm looking for a way to simplify my code:

# Dataset
categorical_data = pd.Series(["dog", "lion", "cat", "crustacean", "dog", "insect", "insect", "cat", "crustacean"])

What I wanna do is to replace dogs, lions & cats with "animal". I can do them by writing this:

categorical_data = categorical_data.str.replace("dog", "animal")
categorical_data = categorical_data.str.replace("cat", "animal")
categorical_data = categorical_data.str.replace("lion", "animal")

Is there a way for the str.replace() function to accept a list of strings instead of just one?

Example:

categorical_data = categorical_data.str.replace([dog, lion, cat], "animal")
like image 452
Maku Avatar asked Sep 06 '19 10:09

Maku


People also ask

How do I replace multiple substrings in a string?

Use the translate() method to replace multiple different characters. You can create the translation table specified in translate() by the str. maketrans() . Specify a dictionary whose key is the old character and whose value is the new string in the str.

How can I replace multiple values with one value in Pandas?

Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.

How replace column values in Pandas based on multiple conditions?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.


2 Answers

For replace by list is possible use Series.replace:

categorical_data = categorical_data.replace(['dog', 'lion', 'cat'], "animal")    
print (categorical_data)
0        animal
1        animal
2        animal
3    crustacean
4        animal
5        insect
6        insect
7        animal
8    crustacean
dtype: object

Difference between answers is with subtrings replacement:

categorical_data = pd.Series(["dog gorilla", "lion", "cat", "crustacean"])

print (categorical_data.replace(['dog', 'lion', 'cat'], "animal"))
0    dog gorilla
1         animal
2         animal
3     crustacean
dtype: object

print (categorical_data.str.replace(r'(dog|cat|lion)', 'animal', regex=True))
0    animal gorilla
1            animal
2            animal
3        crustacean
dtype: object
like image 102
jezrael Avatar answered Oct 23 '22 23:10

jezrael


You could instead use a regex with str.replace, separating the strings to match with | which will be replacing any match among the specified strings:

categorical_data.str.replace(r'(dog|cat|lion)', 'animal', regex=True)

0        animal
1        animal
2        animal
3    crustacean
4        animal
5        insect
6        insect
7        animal
8    crustacean
dtype: object
like image 40
yatu Avatar answered Oct 24 '22 01:10

yatu