Match same number of repetitions of character as repetitions of captured group

Tags:

I would like to clean some input that was logged from my keyboard with python and regex. Especially when backspace was used to fix a mistake.

Example 1:

[in]:  'Helloo<BckSp> world'
[out]: 'Hello world'

This can be done with

re.sub(r'.<BckSp>', '', 'Helloo<BckSp> world')

Example 2:
However when I have several backspaces, I don't know how to delete exactly the same number of characters before:

[in]:  'Helllo<BckSp><BckSp>o world'
[out]: 'Hello world'

(Here I want to remove 'l' and 'o' before the two backspaces).

I could simply use re.sub(r'[^>]<BckSp>', '', line) several times until there is no <BckSp> left but I would like to find a more elegant / faster solution.

Does anyone know how to do this ?

763

asked Dec 27 '16 10:12

Louis M

2 Answers

It looks like Python does not support recursive regex. If you can use another language, you could try this:

.(?R)?<BckSp>

See: https://regex101.com/r/OirPNn/1

198

answered Oct 07 '22 21:10

Fallenhero

It isn't very efficient but you can do that with the re module:

(?:[^<](?=[^<]*((?=(\1?))\2<BckSp>)))+\1

demo

This way you don't have to count, the pattern only uses the repetition.

(?: 
    [^<] # a character to remove
    (?=  # lookahead to reach the corresponding <BckSp>
        [^<]* # skip characters until the first <BckSp>
        (  # capture group 1: contains the <BckSp>s
            (?=(\1?))\2 # emulate an atomic group in place of \1?+
                        # The idea is to add the <BcKSp>s already matched in the
                        # previous repetitions if any to be sure that the following
                        # <BckSp> isn't already associated with a character
            <BckSp> # corresponding <BckSp>
        )
    )
)+ # each time the group is repeated, the capture group 1 is growing with a new <BckSp>

\1 # matches all the consecutive <BckSp> and ensures that there's no more character
   # between the last character to remove and the first <BckSp>

You can do the same with the regex module, but this time you don't need to emulate the possessive quantifier:

(?:[^<](?=[^<]*(\1?+<BckSp>)))+\1

demo

But with the regex module, you can also use the recursion (as @Fallenhero noticed it):

[^<](?R)?<BckSp>

demo

answered Oct 07 '22 21:10

Casimir et Hippolyte

Related questions
                            
                                TypeError: object of type 'method' has no len() [closed]
                            
                                How to execute Python Code on Interpreter Startup in Virtualenv?
                            
                                use cntk trained model with python
                            
                                Why are there different Lemmatizers in NLTK library?
                            
                                subprocess.Popen - No such file or directory [duplicate]
                            
                                In BeautifulSoup, Ignore Children Elements While Getting Parent Element Data
                            
                                Google Drive API - ImportError: cannot import name util
                            
                                pandas replace part of a column with another column
                            
                                Why python bulit-in functions such as sum(),max(),min() can be used to calculate the numpy's datatype ndarray?
                            
                                Which is the more efficient way to choose a random pair of objects from a list of lists or tuples?
                            
                                Cannot catch ConnectionError with requests
                            
                                Check if mail is read, gmail api
                            
                                Flask RestPlus inherit model doesn't work as expected
                            
                                How to compare tensor inside tensorflow?
                            
                                single-step simulation in tensorflow RNN
                            
                                Debug python application running in Docker
                            
                                Unable to import opencv in Jupyter notebook but able to import in command line on Anaconda
                            
                                Slice MultiIndex pandas DataFrame by position
                            
                                Read Directory of Timeseries CSV data efficiently with Dask DataFrame and Pandas
                            
                                How to do linear regression using Python and Scikit learn using one hot encoding?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Match same number of repetitions of character as repetitions of captured group

Tags:

python

regex

backreference

Louis M

People also ask

2 Answers

Fallenhero

Casimir et Hippolyte

Recent Activity

Donate For Us