I have a file which contains in it several lines of problematic syntax, I'd like to do find all occurrences of it and replace it with acceptable syntax.
Example:
<field id="someId" type="xs:decimal" bind="someId">
<description/>
<region id="Calc.R_315.`0" page="1"/>
<region id="Calc.R_315.`1" page="1"/>
</field>
I'd like to a string replacement of all occurrences of
<dot><tick><number> i.e. .`0 or .`1 or .`2 et cetera
with
<dash><number> i.e. -1 or -2 or -3
Notice it begins at 1 instead of 0.
I have the following python code which performs an inline replacement of however it starts at 0, I'd like it to start at 1.
with fileinput.input(files="file.xml", inplace=True, backup='.original.bak', mode='r') as f:
for line in f:
pattern = "\.`(\d+)"
result = re.sub(pattern, lambda exp: "-{}".format(exp.groups()[0]), line)
print(result, end='')
How to accomplish my goal?
Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.
To increment a character in a Python, we have to convert it into an integer and add 1 to it and then cast the resultant integer to char. We can achieve this using the builtin methods ord and chr.
For each step add the regex (without delimiters), the modifiers and the substitution string. For the above example this would be (6 + 1 + 3) + (3 + 0 + 2) + (2 + 1 + 0) = 18 .
Two-regex is now 15.9 times slower than string replacement, and regex/lambda 38.8 times slower.
You are almost at the solution yourself!
The only thing remaining is to convert the captured number into an int
, and add 1 to it. Simple!
So the relevant line of code becomes:
result = re.sub(pattern, lambda exp: "-{}".format(int(exp.groups()[0]) + 1), line)
Another slight modification that can be made is to change .groups()[0]
to .group(1)
. You can learn more about group
and its usage in the documentation.
One last thing: It is always better to define your regex pattern as a raw string so as to avoid any future headaches.
You can try this:
import re
s = """
<field id="someId" type="xs:decimal" bind="someId">
<description/>
<region id="Calc.R_315.`0" page="1"/>
<region id="Calc.R_315.`1" page="1"/>
</field>
"""
new_s = re.sub('\.`\d+', '{}', s).format(*map(lambda x:'-{}'.format(int(x)+1), re.findall('(?<=\.`)\d+(?=")', s)))
print(new_s)
Output:
<field id="someId" type="xs:decimal" bind="someId">
<description/>
<region id="Calc.R_315-1" page="1"/>
<region id="Calc.R_315-2" page="1"/>
</field>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With