Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match LaTeX equations

Tags:

regex

latex

I am trying to configure the TeXWorks editor to use the same syntax coloring as TeXMaker. However, TexWorks uses regexes to specify what should be coloured. Unfortunately it doesn't have a default setting for math.

I want to match everything between $ and $, everything between \[ and \], everything between \( and \), everything between $$ and $$. The latter is not very necessary because it's rarely used in LaTeX documents.

It can also be more than one regexes to match all cases.

Of course \$ is escaped so I don't want to match that, nor \\[ etc.

Then I also want to match everything between \begin{equation} and \end{equation}, but that is simple.

'It cannot be done' is a possible answer.

like image 219
marczellm Avatar asked Jan 06 '13 14:01

marczellm


1 Answers

Try this PCRE regex:

(?<!\\)    # negative look-behind to make sure start is not escaped 
(?:        # start non-capture group for all possible match starts
  # group 1, match dollar signs only 
  # single or double dollar sign enforced by look-arounds
  ((?<!\$)\${1,2}(?!\$))|
  # group 2, match escaped parenthesis
  (\\\()|
  # group 3, match escaped bracket
  (\\\[)|                 
  # group 4, match begin equation
  (\\begin\{equation\})
)
# if group 1 was start
(?(1)
  # non greedy match everything in between
  # group 1 matches do not support recursion
  (.*?)(?<!\\)
  # match ending double or single dollar signs
  (?<!\$)\1(?!\$)|  
# else
(?:
  # greedily and recursively match everything in between
  # groups 2, 3 and 4 support recursion
  (.*(?R)?.*)(?<!\\)
  (?:
    # if group 2 was start, escaped parenthesis is end
    (?(2)\\\)|  
    # if group 3 was start, escaped bracket is end
    (?(3)\\\]|     
    # else group 4 was start, match end equation
    \\end\{equation\}
  )
))))

See this regex in action: https://regex101.com/r/wP2aV6/25

Since this regex uses recursion it will handle nested mathematical expressions correctly.

This works only on PCRE compatible regex engines. It requires some advanced features of regex engines, like negative lookbehind, conditional expressions and recursion which are not present in all regex engines.

Unless you need something really simple then I would advise against using this regex and instead using a proper LaTeX parser.

like image 136
Lodewijk Bogaards Avatar answered Oct 21 '22 09:10

Lodewijk Bogaards