I'm looking for a way to simplify my code:
# Dataset
categorical_data = pd.Series(["dog", "lion", "cat", "crustacean", "dog", "insect", "insect", "cat", "crustacean"])
What I wanna do is to replace dogs, lions & cats with "animal". I can do them by writing this:
categorical_data = categorical_data.str.replace("dog", "animal")
categorical_data = categorical_data.str.replace("cat", "animal")
categorical_data = categorical_data.str.replace("lion", "animal")
Is there a way for the str.replace()
function to accept a list of strings instead of just one?
Example:
categorical_data = categorical_data.str.replace([dog, lion, cat], "animal")
Use the translate() method to replace multiple different characters. You can create the translation table specified in translate() by the str. maketrans() . Specify a dictionary whose key is the old character and whose value is the new string in the str.
Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
For replace by list is possible use Series.replace
:
categorical_data = categorical_data.replace(['dog', 'lion', 'cat'], "animal")
print (categorical_data)
0 animal
1 animal
2 animal
3 crustacean
4 animal
5 insect
6 insect
7 animal
8 crustacean
dtype: object
Difference between answers is with subtrings replacement:
categorical_data = pd.Series(["dog gorilla", "lion", "cat", "crustacean"])
print (categorical_data.replace(['dog', 'lion', 'cat'], "animal"))
0 dog gorilla
1 animal
2 animal
3 crustacean
dtype: object
print (categorical_data.str.replace(r'(dog|cat|lion)', 'animal', regex=True))
0 animal gorilla
1 animal
2 animal
3 crustacean
dtype: object
You could instead use a regex with str.replace
, separating the strings to match with |
which will be replacing any match among the specified strings:
categorical_data.str.replace(r'(dog|cat|lion)', 'animal', regex=True)
0 animal
1 animal
2 animal
3 crustacean
4 animal
5 insect
6 insect
7 animal
8 crustacean
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With