Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using regex matched groups in pandas dataframe replace function

Tags:

python

pandas

I'm just learning python/pandas and like how powerful and concise it is.

During data cleaning I want to use replace on a column in a dataframe with regex but I want to reinsert parts of the match (groups).

Simple Example: lastname, firstname -> firstname lastname

I tried something like the following (actual case is more complex so excuse the simple regex):

df['Col1'].replace({'([A-Za-z])+, ([A-Za-z]+)' : '\2 \1'}, inplace=True, regex=True) 

However, this results in empty values. The match part works as expected, but the value part doesn't. I guess this could be achieved by some splitting and merging, but I am looking for a general answer as to whether the regex group can be used in replace.

like image 717
Peter D Avatar asked Jan 04 '17 20:01

Peter D


People also ask

What is regex in replace Pandas?

replace() Pandas replace() is a very rich function that is used to replace a string, regex, dictionary, list, and series from the DataFrame. The values of the DataFrame can be replaced with other values dynamically. It is capable of working with the Python regex(regular expression). It differs from updating with .

How do you replace groups in Python?

sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string. Set the count value to the number of replacements you want to perform.

How do you replace items in Pandas?

Pandas DataFrame replace() MethodThe replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.


1 Answers

I think you have a few issues with the RegEx's.

As @Abdou just said use either '\\2 \\1' or better r'\2 \1', as '\1' is a symbol with ASCII code 1

Your solution should work if you will use correct RegEx's:

In [193]: df Out[193]:               name 0        John, Doe 1  Max, Mustermann  In [194]: df.name.replace({r'(\w+),\s+(\w+)' : r'\2 \1'}, regex=True) Out[194]: 0          Doe John 1    Mustermann Max Name: name, dtype: object  In [195]: df.name.replace({r'(\w+),\s+(\w+)' : r'\2 \1', 'Max':'Fritz'}, regex=True) Out[195]: 0            Doe John 1    Mustermann Fritz Name: name, dtype: object 
like image 130
MaxU - stop WAR against UA Avatar answered Sep 24 '22 02:09

MaxU - stop WAR against UA