Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I match nested braces using regular expressions in PHP?

Tags:

regex

php

I have an LaTeX document I want to match. And I need a RegEx match that matches the following:

\ # the backslash in the beginning
[a-zA-Z]+ #a word
(\{.+\})* # any amount of {something}

However, and her is the catch;

In the last line, it 1. needs to be greedy and 2. needs to have a matching number of {} inside itself.

Meaning if I have the string \test{something\somthing{9}} it would match the whole. And it needs to be in that order ({}). So that it doesn't match the following:

\LaTeX{} is a document preparation system for the \TeX{}

just

\LaTeX{}

and

\TeX{}

Help anyone? Maybe someone have an better idea for matching? Should I not use regular expressions?

like image 637
Knarf Avatar asked Oct 24 '22 21:10

Knarf


1 Answers

This can be done with recursion:

$input = "\LaTeX{} is a document preparation system for the \TeX{}
\latex{something\somthing{9}}";

preg_match_all('~(?<token>
        \\\\ # the slash in the beginning
        [a-zA-Z]+ #a word
        (\{[^{}]*((?P>token)[^{}]*)?\}) # {something}
)~x', $input, $matches);

This correctly matches \LaTeX{}, \TeX{}, and \latex{something\somthing{9}}

like image 151
Arnaud Le Blanc Avatar answered Oct 31 '22 19:10

Arnaud Le Blanc