replace pattern with a sequential number string in python

Tags:

I'm trying to achieve the following replacement in python. Replace all html tags with {n} & create a hash of [tag, {n}]
Original string -> "<h> This is a string. </H><P> This is another part. </P>"
Replaced text -> "{0} This is a string. {1}{2} This is another part. {3}"

Here's my code. I've started with replacement but I'm stuck at the replacement logic as I cannot figure out the best way to replace each occurrence in a consecutive manner i.e with {0}, {1} and so on:

import re
text = "<h> This is a string. </H><p> This is another part. </P>"

num_mat = re.findall(r"(?:<(\/*)[a-zA-Z0-9]+>)",text)
print(str(len(num_mat)))

reg = re.compile(r"(?:<(\/*)[a-zA-Z0-9]+>)",re.VERBOSE)

phctr = 0
#for phctr in num_mat:
#    phtxt = "{" + str(phctr) + "}"
phtxt = "{" + str(phctr) + "}"
newtext = re.sub(reg,phtxt,text)

print(newtext)

Can someone help with a better way of achieving this? Thank you!

696

asked Nov 29 '12 09:11

Ans

1 Answers

import re
import itertools as it

text = "<h> This is a string. </H><p> This is another part. </P>"

cnt = it.count()
print re.sub(r"</?\w+>", lambda x: '{{{}}}'.format(next(cnt)), text)

prints

{0} This is a string. {1}{2} This is another part. {3}

Works for simple tags only (no attributes/spaces in tags). For extended tags, you have to adapt the regexp.

Also, not reinitializing cnt = it.count() will keep the numbering going on.

UPDATE to get a mapping dict:

import re
import itertools as it

text = "<h> This is a string. </H><p> This is another part. </P>"

cnt = it.count()
d = {}
def replace(tag, d, cnt):
    if tag not in d:
        d[tag] = '{{{}}}'.format(next(cnt))
    return d[tag]
print re.sub(r"(</?\w+>)", lambda x: replace(x.group(1), d, cnt), text)
print d

prints:

{0} This is a string. {1}{2} This is another part. {3}
{'</P>': '{3}', '<h>': '{0}', '<p>': '{2}', '</H>': '{1}'}

143

answered Sep 18 '22 02:09

eumiro

Related questions
                            
                                How to combine interactive prompting with argparse in python?
                            
                                What is the difference between numpy "type identifiers" and "types" within Cython?
                            
                                Vector-valued function interpolation using NumPy/SciPy
                            
                                Showing a gtk.Calendar in a menu?
                            
                                sqlalchemy raw sql query limit using connection.execute()
                            
                                f2py -- prevent array reordering
                            
                                PyQt4 @pyqtSlot: what is the result kwarg for?
                            
                                NLTK named entity recognition in dutch
                            
                                Convert np.ndarray to np.array in python
                            
                                How to create a list of random integer vector whose sum is x
                            
                                how to compare one item in a list with all the other items in this list, python
                            
                                Python: intersection of nested lists where order matters
                            
                                Using sparse matrices/online learning in Naive Bayes (Python, scikit)
                            
                                Optimize conversion between list of integer coefficients and its long integer representation
                            
                                How to define LTI systems with Time delay in Scipy?
                            
                                Testing functions returning iterable in python
                            
                                Creating python 2.7 daemon with pep-3143
                            
                                python separate round particles by offsetting contours / shrinking polygones
                            
                                Python TypeError: unsupported operand type(s) for -: 'int' and 'function'
                            
                                Pythonic way to Implement Data Types (Python 2.7)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

replace pattern with a sequential number string in python

Tags:

python

replace

sequential

Ans

People also ask

1 Answers

eumiro

Recent Activity

Donate For Us