Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx needed for Wikipedia infobox

OK, so here's what I need :

  • We have the full XML of a Wikipedia article
  • We need just the Infobox section

I have tried various things, but my main issue seems to be not being able to matching "internal" curly brackets. Any ideas (or any regex you have managed to get this done?)

For those of you who do not know what I'm talking about, here's a (somewhat abridged) example of what I'm trying to parse : http://regexr.com?38299

(What is needed is the part between {{Infobox ******* up to its corresponding closing brackets (}}).

like image 412
Dr.Kameleon Avatar asked Jan 12 '23 10:01

Dr.Kameleon


1 Answers

Ok, I got it!

Try this..:

(?=\{Infobox)(\{([^{}]|(?1))*\})

Here's the working example:

http://regex101.com/r/kT1jF4

like image 126
Bryan Elliott Avatar answered Jan 14 '23 00:01

Bryan Elliott