I have dynamic regexp in which I don't know in advance how many groups it has I would like to replace all matches with xml tags example <pre class="prettyprint"><code>re.sub("(this).*(string)","this is my string",'<markup>\anygroup</markup>') >> "<markup>this</markup> is my <markup>string</markup>" </code></pre> is that even possible in single line?

For a constant regexp like in your example, do <pre class="prettyprint"><code>re.sub("(this)(.*)(string)", r'<markup>\1</markup>\2<markup>\3</markup>', text) </code></pre> Note that you need to enclose .* in parentheses as well if you don't want do lose it. Now if you don't know what the regexp looks like, it's more difficult, but should be doable. <pre class="prettyprint"><code>pattern = "(this)(.*)(string)" re.sub(pattern, lambda m: ''.join('<markup>%s</markup>' % s if n % 2 == 0 else s for n, s in enumerate(m.groups())), text) </code></pre> If the first thing matched by your pattern doesn't necessarily have to be marked up, use this instead, with the first group optionally matching some prefix text that should be left alone: <pre class="prettyprint"><code>pattern = "()(this)(.*)(string)" re.sub(pattern, lambda m: ''.join('<markup>%s</markup>' % s if n % 2 == 1 else s for n, s in enumerate(m.groups())), text) </code></pre> You get the idea. If your regexps are complicated and you're not sure you can make everything part of a group, where only every second group needs to be marked up, you might do something smarter with a more complicated function: <pre class="prettyprint"><code>pattern = "(this).*(string)" def replacement(m): s = m.group() n_groups = len(m.groups()) # assume groups do not overlap and are listed left-to-right for i in range(n_groups, 0, -1): lo, hi = m.span(i) s = s[:lo] + '<markup>' + s[lo:hi] + '</markup>' + s[hi:] return s re.sub(pattern, replacement, text) </code></pre> If you need to handle overlapping groups, you're on your own, but it should be doable.

<code>re.sub()</code> will replace everything it can. If you pass it a function for <code>repl</code> then you can do even more.

replacing all regex matches in single line

I have dynamic regexp in which I don't know in advance how many groups it has I would like to replace all matches with xml tags

example

re.sub("(this).*(string)","this is my string",'<markup>\anygroup</markup>')
>> "<markup>this</markup> is my <markup>string</markup>"

is that even possible in single line?

How do you replace all occurrences of a regex pattern in a string?

sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string. Set the count value to the number of replacements you want to perform.

Can you replace with regex?

When you want to search and replace specific patterns of text, use regular expressions. They can help you in pattern matching, parsing, filtering of results, and so on. Once you learn the regex syntax, you can use it for almost any language. Press Ctrl+R to open the search and replace pane.

What is '?' In regex?

The '?' means match zero or one space. This will match "Kaleidoscope", as well as all the misspellings that are common, the [] meaning match any of the alternatives within the square brackets.

Does * match everything in regex?

Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.

For a constant regexp like in your example, do

re.sub("(this)(.*)(string)",
       r'<markup>\1</markup>\2<markup>\3</markup>',
       text)

Note that you need to enclose .* in parentheses as well if you don't want do lose it.

Now if you don't know what the regexp looks like, it's more difficult, but should be doable.

pattern = "(this)(.*)(string)"
re.sub(pattern,
       lambda m: ''.join('<markup>%s</markup>' % s if n % 2 == 0
                         else s for n, s in enumerate(m.groups())),
       text)

If the first thing matched by your pattern doesn't necessarily have to be marked up, use this instead, with the first group optionally matching some prefix text that should be left alone:

pattern = "()(this)(.*)(string)"
re.sub(pattern,
       lambda m: ''.join('<markup>%s</markup>' % s if n % 2 == 1
                         else s for n, s in enumerate(m.groups())),
       text)

You get the idea.

If your regexps are complicated and you're not sure you can make everything part of a group, where only every second group needs to be marked up, you might do something smarter with a more complicated function:

pattern = "(this).*(string)"
def replacement(m):
    s = m.group()
    n_groups = len(m.groups())
    # assume groups do not overlap and are listed left-to-right
    for i in range(n_groups, 0, -1):
        lo, hi = m.span(i)
        s = s[:lo] + '<markup>' + s[lo:hi] + '</markup>' + s[hi:]
    return s
re.sub(pattern, replacement, text)

If you need to handle overlapping groups, you're on your own, but it should be doable.

re.sub() will replace everything it can. If you pass it a function for repl then you can do even more.

Yes, this can be done in a single line.

>>> re.sub(r"\b(this|string)\b", r"<markup>\1</markup>", "this is my string")
'<markup>this</markup> is my <markup>string</markup>'

\b ensures that only complete words are matched.

So if you have a list of words that you need to mark up, you could do the following:

>>> mywords = ["this", "string", "words"]
>>> myre = r"\b(" + "|".join(mywords) + r")\b"
>>> re.sub(myre, r"<markup>\1</markup>", "this is my string with many words!")
'<markup>this</markup> is my <markup>string</markup> with many <markup>words</markup>!'

replacing all regex matches in single line

Tags:

python

regex

damir

People also ask

3 Answers

Marius Gedminas

Ignacio Vazquez-Abrams

Tim Pietzcker

Recent Activity

Donate For Us

replacing all regex matches in single line

Tags:

python

regex

damir

People also ask

3 Answers

Marius Gedminas

Ignacio Vazquez-Abrams

Tim Pietzcker

Related questions

Recent Activity

Donate For Us