Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression with wiki text

Tags:

python

regex

wiki

I'm trying to change wikitext into normal text using Python regular expressions substitution. There are two formatting rules regarding wiki link.

  • [[Name of page]]
  • [[Name of page | Text to display]]

    (http://en.wikipedia.org/wiki/Wikipedia:Cheatsheet)

Here is some text that gives me a headache.

The CD is composed almost entirely of [[cover version]]s of [[The Beatles]] songs which George Martin [[record producer|produced]] originally.

The text above should be changed into:

The CD is composed almost entirely of cover versions of The Beatles songs which George Martin produced originally.

The conflict between [[ ]] and [[ | ]] grammar is my main problem. I don't need one complex regular expression. Applying multiple (maybe two) regular expression substitution(s) in sequence is ok.

Please enlighten me on this problem.

like image 482
redism Avatar asked Mar 05 '26 15:03

redism


1 Answers

wikilink_rx = re.compile(r'\[\[(?:[^|\]]*\|)?([^\]]+)\]\]')
return wikilink_rx.sub(r'\1', the_string)

Example: http://ideone.com/7oxuz

Note: you may also find some MediaWiki parsers in http://www.mediawiki.org/wiki/Alternative_parsers.

like image 145
kennytm Avatar answered Mar 08 '26 04:03

kennytm



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!