I have an LaTeX document I want to match. And I need a RegEx match that matches the following:
\ # the backslash in the beginning
[a-zA-Z]+ #a word
(\{.+\})* # any amount of {something}
However, and her is the catch;
In the last line, it 1. needs to be greedy and 2. needs to have a matching number of {}
inside itself.
Meaning if I have the string \test{something\somthing{9}}
it would match the whole. And it needs to be in that order ({}
). So that it doesn't match the following:
\LaTeX{} is a document preparation system for the \TeX{}
just
\LaTeX{}
and
\TeX{}
Help anyone? Maybe someone have an better idea for matching? Should I not use regular expressions?
This can be done with recursion:
$input = "\LaTeX{} is a document preparation system for the \TeX{}
\latex{something\somthing{9}}";
preg_match_all('~(?<token>
\\\\ # the slash in the beginning
[a-zA-Z]+ #a word
(\{[^{}]*((?P>token)[^{}]*)?\}) # {something}
)~x', $input, $matches);
This correctly matches \LaTeX{}
, \TeX{}
, and \latex{something\somthing{9}}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With