Pandas replace multiple values at once

Tags:

I'm attempting to clean up some of the Data that I have from an excel file. The file contains 7400 rows and 18 columns, which includes a list of customers with their respective addresses and other data. The problem that I'm encountering is that some of the cities are misspelled which distorts the information and makes it difficult for further processing.

  SURNAME   | ADDRESS          | CITY
0 Jenson    | 252 Des Chênes   | D.DO
1 Jean      | 236 Gouin        | DOLLARD
2 Denis     | 993 Boul. Gouin  | DOLLARD-DES-ORMEAUX
3 Bradford  | 1690 Dollard #7  | DDO
4 Alisson   | 115 Du Buisson   | IL PERROT
5 Abdul     | 9877 Boul. Gouin | Pierrefonds
6 O'Neil    | 5 Du College     | Ile Bizard
7 Bundy     | 7345 Sherbrooke  | ILLE Perot
8 Darcy     | 8671 Anthony #2  | ILE Perrot
9 Adams     | 845 Georges      | Pierrefonds

In the above example D.DO, DOLLARD, DDO should be spelled DOLLARD-DES-ORMEAUX and IL PERROT, ILLE PEROT, ILE PERROT should be spelled ILE-PERROT.

I've been able to replace the values using:

df["CITY"].replace(to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", regex=True) 
df["CITY"].replace(to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT", regex=True)

Is there some way of combining the above operations into one? I've tried:

df["CITY"].replace({to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT"}, regex=True)

but I've had no luck

506

asked Mar 17 '16 22:03

Lukasz

2 Answers

try .replace({}, regex=True) method:

replacements = {
   'CITY': {
      r'(D.*DO|DOLLARD.*)': 'DOLLARD-DES-ORMEAUX',
      r'I[lL]*[eE]*.*': 'ILLE Perot'}
}

df.replace(replacements, regex=True, inplace=True)

print(df)

Output:

    SURNAME           ADDRESS                 CITY
0    Jenson    252 Des Ch├¬nes  DOLLARD-DES-ORMEAUX
1      Jean         236 Gouin  DOLLARD-DES-ORMEAUX
2     Denis   993 Boul. Gouin  DOLLARD-DES-ORMEAUX
3  Bradford   1690 Dollard #7  DOLLARD-DES-ORMEAUX
4   Alisson    115 Du Buisson           ILLE Perot
5     Abdul  9877 Boul. Gouin          Pierrefonds
6    O'Neil      5 Du College           ILLE Perot
7     Bundy   7345 Sherbrooke           ILLE Perot
8     Darcy   8671 Anthony #2           ILLE Perot
9     Adams       845 Georges          Pierrefonds

172

answered Oct 24 '22 10:10

MaxU - stop WAR against UA

You can create a dictionary of replacements and then iterate through them, using 'loc' for replacement.

target_for_values = {
    'DOLLARD-DES-ORMEAUX': ['D.DO', 'DOLLARD', 'DDO'], 
    'ILE-PERROT': ['IL PERROT', 'ILLE PEROT', 'ILE PERROT']}

for k, v in target_for_values.iteritems():
    df.loc[df.CITY.str.upper().isin(v), 'CITY'] = k

>>> df.CITY
                  CITY
0                 C.DO
1  DOLLARD-DES-ORMEAUX
2  DOLLARD-DES-ORMEAUX
3  DOLLARD-DES-ORMEAUX
4           ILE-PERROT
5          Pierrefonds
6           Ile Bizard
7           ILE-PERROT
8           ILE-PERROT
9          Pierrefonds

answered Oct 24 '22 09:10

Alexander

Related questions
                            
                                Quickest way to make a get_dummies type dataframe from a column with a multiple of strings
                            
                                Converting PeriodIndex to DateTimeIndex?
                            
                                Django admin inline: select_related
                            
                                A value is trying to be set on a copy of a slice from a DataFrame-warning even after using .loc
                            
                                Error connecting python to neo4j using py2neo
                            
                                How do I compare two Python Pandas Series of different lengths?
                            
                                What is the difference between Polygon.contains and Polygon.within?
                            
                                Python/Tkinter: How to set text widget contents to the value of a variable?
                            
                                How to test for an an empty Redis key in python
                            
                                How to find where a Python class is defined
                            
                                django how to set request user in client test
                            
                                Setting 1 or 0 to new Pandas column conditionally [duplicate]
                            
                                Why doesn't Python auto escape '\' in __doc__?
                            
                                Python XPath SyntaxError: invalid predicate
                            
                                Python file keyword argument?
                            
                                Where is Pip3 Installing Modules?
                            
                                Python logging module having a formatter causes AttributeError
                            
                                How to use mahalanobis distance in sklearn DistanceMetrics?
                            
                                Getting timezone name from UTC offset
                            
                                `object in list` behaves different from `object in dict`?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas replace multiple values at once

Tags:

python

pandas

Lukasz

People also ask

2 Answers

MaxU - stop WAR against UA

Alexander

Recent Activity

Donate For Us