Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I sanitize LaTeX input?

I'd like to take user input (sometimes this will be large paragraphs) and generate a LaTeX document. I'm considering a couple of simple regular expressions that replaces all instances of \ with \textbackslash and all instances of { or } with \} or \{.

I doubt that this is sufficient. What else do I need to do? Note: In case there is a special library made for this, I'm using python.

To clarify, I do not wish anything to be parsed treated as LaTeX syntax: $a$ should be replaced with \$a\$.

like image 942
Conley Owens Avatar asked Apr 13 '10 05:04

Conley Owens


1 Answers

If your input is plain text and you are in a normal catcode regime, you must do the following substitutions:

  • \\textbackslash{} (note the empty group!)
  • {\{
  • }\}
  • $\$
  • &\&
  • #\#
  • ^\textasciicircum{} (requires the textcomp package)
  • _\_
  • ~\textasciitilde{}
  • %\%

In addition, the following substitutions are useful at least when using the OT1 encoding (and harmless in any case):

  • <\textless{}
  • >\textgreater{}
  • |\textbar{}

And these three disable the curly quotes:

  • "\textquotedbl{}
  • '\textquotesingle{}
  • `\textasciigrave{}
like image 56
Philipp Avatar answered Sep 25 '22 15:09

Philipp