Python Regular expression only matches once

Question

I'm trying to create a simple markdown to latex converter, just to learn python and basic regex, but I'm stuck trying to figure out why the below code doesn't work:

re.sub (r'$$\*$$(.*?)$$\*$$: ?(.*?)$',  r'\footnote{\2}\1', s, flags=re.MULTILINE|re.DOTALL)

I want to convert something like:

s = """This is a note[*] and this is another[*]
[*]: some text
[*]: other text"""

to:

This is a note\footnote{some text} and this is another\footnote{other text}

this is what I got (from using my regex above):

This is a note\footnote{some text} and this is another[*]

[*]: note 2

Why is the pattern only been matched once?

EDIT:

I tried the following lookahead assertion:

re.sub(r'$$\*$$(?!:)(?=.+?$$\*$$: ?(.+?)$',r'\footnote{\1}',flags=re.DOTALL|re.MULTILINE)
#(?!:) is to prevent [*]: to be matched

now it matches all the footnotes, however they're not matched correctly.

s = """This is a note[*] and this is another[*]
[*]: some text
[*]: other text"""

is giving me

This is a note\footnote{some text} and this is another\footnote{some text}
[*]: note 1
[*]: note 2

Any thoughts about it?

Casimir et Hippolyte · Accepted Answer

The reason is that you can't match the same characters several times. Once a character is matched, it is consumed by the regex engine and can't be reused for an other match.

A (general) workaround consists to capture overlapped parts inside a lookahead assertion with capture groups. But it can't be done in your case because there is no way to differentiate which note is associated with the placeholder.

A more simple way can be to extract all the notes first in a list and then to replace each placeholder with a callback. Example:

import re

s='''This is a note[*] and this is another[*]
[*]: note 1
[*]: note 2'''

# text and notes are separated
[text,notes] = re.split(r'((?:\r?\n$$\*$$:[^\r\n]*)+$)', s)[:-1]

# this generator gives the next replacement string 
def getnote(notes):
    for note in re.split(r'\r?\n$$\*$$: ', notes)[1:]:
        yield r'\footnote{{{}}}'.format(note)

note = getnote(notes)

res = re.sub(r'$$\*$$', lambda m: note.next(), text)
print res

Python Regular expression only matches once

Tags:

python

regex

EDIT:

Afonso Silva

1 Answers

Casimir et Hippolyte

Recent Activity

Donate For Us

Python Regular expression only matches once

Tags:

python

regex

EDIT:

Afonso Silva

1 Answers

Casimir et Hippolyte

Related questions

Recent Activity

Donate For Us