We've run into a problem with some markdown content. A few jquery editors we used did not write proper markdown syntax. Embedded Links used the 'label' format, which drops the links at the bottom of the document ( Just like the StackOverflow editor ). The problem we encountered, is that the links were sometimes formatted in a non-standard way. While they were allowed to be prefixed with 0,3 spaces, some came in at 4 spaces (You might notice that StackOverflow forces 2 spaces in javascript) -- which triggers it as preformatted text
in markdown parsers.
As a quick example:
This is a sample doucument that would have inline links.
[Example 0][0], [Example 1][1], [Example 2][2] , [Example 3][3] , [Example 4][4]
[0]: http://example.com
[1]: http://example.com/1
[2] : http://example.com/2
[3]: http://example.com/3
[4] : http://example.com/4
I'm wanting to reformat this last section into proper markdown:
[0]: http://example.com
[1]: http://example.com/1
[2]: http://example.com/2
[3]: http://example.com/3
[4]: http://example.com/4
I'm running into a wall trying to come up with the right regex to catch the 'labels' section. I can grab the labels within the section fine -- but the section is eluding me.
Here's what I have so far:
RE_footnote = re.compile("""
(?P<labels_section>
^[\t\ ]*$ ## we must start with an empty line
\s+
(?P<labels>
(?P<a_label>
^
[\ \t]* ## we could have 0-n spaces or tabs
\[ ## BRACKET - open
(?P<id>
[^^\]]+
)
\] ## BRACKET - close
\s*
: ## COLON
\s*
(?P<link> ## WE want anything here
[^$]+
)
$
)+ ## multiple labels
)
)
""",re.VERBOSE|re.I|re.M)
The specific problems I have:
I can't figure out how to allow for 1 or more "blank lines". This triggers an invalid regex with nothing to repeat:
(?: ## wrap it in a non-capturing group, require 1+ occurances
^[\t\ ]*$
)+
The match won't work without a whitespace match before the group \s+
. I can't figure out what/why.
I want this to match at the END of the document only , to ensure we're only fixing these javascript errors ( and not something at the core of the document ). all my attempts to work a \z
into this have failed, miserably.
can anyone offer some advice?
updated
this works:
RE_MARKDOWN_footnote = re.compile("""
(?P<labels_section>
(?: ## we must start with an empty / whitepace-only line
^\s*$
)
\s* ## there can be more whitespace lines
(?P<labels>
(?P<a_label>
^
[\ \t]* ## we could have 0-n spaces or tabs
\[ ## BRACKET - open
(?P<id>
[^^\]]+
)
\] ## BRACKET - close
\s*
: ## COLON
\s*
(?P<link> ## WE want anything here
[^$]+
)
$
)+ ## multiple labels
)
\s* ## we might have some empty lines
\Z ## ensure the end of document
)
""",re.VERBOSE|re.I|re.M)
I just started from scratch; is there a reason something simpler like this couldn't work?
^\s* # beginning of the line; may include whitespace
\[ # opening bracket
(?P<id>\d+) # our ID
\] # closing bracket
\s* # optional whitespace
: # colon
\s* # optional whitespace
(?P<link>[^\n]+) # our link is everything up to a new line
$ # end of the line
This was done using the global and multi-line modifiers, gm
. Replace matches with: [\id]: \link
. Here is a working example: http://regex101.com/r/mM8dI2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With