I'm just learning python/pandas and like how powerful and concise it is. During data cleaning I want to use replace on a column in a dataframe with regex but I want to reinsert parts of the match (groups). Simple Example: lastname, firstname -> firstname lastname I tried something like the following (actual case is more complex so excuse the simple regex): <pre class="prettyprint"><code>df['Col1'].replace({'([A-Za-z])+, ([A-Za-z]+)' : '\2 \1'}, inplace=True, regex=True) </code></pre> However, this results in empty values. The match part works as expected, but the value part doesn't. I guess this could be achieved by some splitting and merging, but I am looking for a general answer as to whether the regex group can be used in replace.

I think you have a few issues with the RegEx's. As @Abdou just said use either <code>'\\2 \\1'</code> or better <code>r'\2 \1'</code>, as <code>'\1'</code> is a symbol with ASCII code <code>1</code> Your solution should work if you will use correct RegEx's: <pre class="prettyprint"><code>In [193]: df Out[193]: name 0 John, Doe 1 Max, Mustermann In [194]: df.name.replace({r'(\w+),\s+(\w+)' : r'\2 \1'}, regex=True) Out[194]: 0 Doe John 1 Mustermann Max Name: name, dtype: object In [195]: df.name.replace({r'(\w+),\s+(\w+)' : r'\2 \1', 'Max':'Fritz'}, regex=True) Out[195]: 0 Doe John 1 Mustermann Fritz Name: name, dtype: object </code></pre>

Using regex matched groups in pandas dataframe replace function

Tags:

python

pandas

I'm just learning python/pandas and like how powerful and concise it is.

During data cleaning I want to use replace on a column in a dataframe with regex but I want to reinsert parts of the match (groups).

Simple Example: lastname, firstname -> firstname lastname

I tried something like the following (actual case is more complex so excuse the simple regex):

df['Col1'].replace({'([A-Za-z])+, ([A-Za-z]+)' : '\2 \1'}, inplace=True, regex=True)

However, this results in empty values. The match part works as expected, but the value part doesn't. I guess this could be achieved by some splitting and merging, but I am looking for a general answer as to whether the regex group can be used in replace.

717

asked Jan 04 '17 20:01

Peter D

1 Answers

I think you have a few issues with the RegEx's.

As @Abdou just said use either '\\2 \\1' or better r'\2 \1', as '\1' is a symbol with ASCII code 1

Your solution should work if you will use correct RegEx's:

In [193]: df Out[193]:               name 0        John, Doe 1  Max, Mustermann  In [194]: df.name.replace({r'(\w+),\s+(\w+)' : r'\2 \1'}, regex=True) Out[194]: 0          Doe John 1    Mustermann Max Name: name, dtype: object  In [195]: df.name.replace({r'(\w+),\s+(\w+)' : r'\2 \1', 'Max':'Fritz'}, regex=True) Out[195]: 0            Doe John 1    Mustermann Fritz Name: name, dtype: object

130

answered Sep 24 '22 02:09

MaxU - stop WAR against UA

Related questions
                            
                                3d Numpy array to 2d
                            
                                Is python Queue.queue get and put thread safe?
                            
                                How to download and write a file from Github using Requests
                            
                                OLS Regression: Scikit vs. Statsmodels? [closed]
                            
                                Passing arguments to superclass constructor without repeating them in childclass constructor
                            
                                Open IPython notebooks (*.ipynb) in read-only view (like a HTML file)
                            
                                Tensorflow : What is the relationship between .ckpt file and .ckpt.meta and .ckpt.index , and .pb file
                            
                                Converting a series of ints to strings - Why is apply much faster than astype?
                            
                                Get kwargs Inside Function
                            
                                Pipe raw OpenCV images to FFmpeg
                            
                                How to pass arguments to the __code__ of a function?
                            
                                How to define two relationships to the same table in SQLAlchemy
                            
                                How I can make apt-get install to my virtualenv?
                            
                                Why 0 ** 0 equals 1 in python
                            
                                Python split for lists
                            
                                calculate turning points / pivot points in trajectory (path)
                            
                                'ImportError: No module named pytz' when trying to import pylab?
                            
                                TypeError: coercing to Unicode: need string or buffer, int found
                            
                                pandas DataFrame concat / update ("upsert")?
                            
                                PyCharm tells me "Cannot start process, the working directory ... does not exist"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With