Given a Unicode string, I want to replace non-ASCII characters by LaTeX code producing them (for example, having é
become \'e
, and œ
become \oe
). I'm incorporating this into a Python code. This should rely on a translation table, and I have come up with the following code, which is simple and seems to work nicely:
accents = [
[ u"à", "\\`a"],
[ u"é", "\\'e"]
]
translation_table = dict([(ord(k), unicode(v)) for k, v in accents])
print u"été à l'eau".translate(translation_table)
But, writing a rather complete translation table will take me a long time, and Google didn't help much. Does someone have such a thing ready, or know where to find one?
PS: I'm new to Python, so I welcome comments on the code above, of course.
OK, so here's the table I've built up for now. Please feel free to edit to add to it! (or comment if you don't have enough reputation to edit)
################################################################
# LaTeX accents replacement
latexAccents = [
[ u"à", "\\`a" ], # Grave accent
[ u"è", "\\`e" ],
[ u"ì", "\\`\\i" ],
[ u"ò", "\\`o" ],
[ u"ù", "\\`u" ],
[ u"ỳ", "\\`y" ],
[ u"À", "\\`A" ],
[ u"È", "\\`E" ],
[ u"Ì", "\\`\\I" ],
[ u"Ò", "\\`O" ],
[ u"Ù", "\\`U" ],
[ u"Ỳ", "\\`Y" ],
[ u"á", "\\'a" ], # Acute accent
[ u"é", "\\'e" ],
[ u"í", "\\'\\i" ],
[ u"ó", "\\'o" ],
[ u"ú", "\\'u" ],
[ u"ý", "\\'y" ],
[ u"Á", "\\'A" ],
[ u"É", "\\'E" ],
[ u"Í", "\\'\\I" ],
[ u"Ó", "\\'O" ],
[ u"Ú", "\\'U" ],
[ u"Ý", "\\'Y" ],
[ u"â", "\\^a" ], # Circumflex
[ u"ê", "\\^e" ],
[ u"î", "\\^\\i" ],
[ u"ô", "\\^o" ],
[ u"û", "\\^u" ],
[ u"ŷ", "\\^y" ],
[ u"Â", "\\^A" ],
[ u"Ê", "\\^E" ],
[ u"Î", "\\^\\I" ],
[ u"Ô", "\\^O" ],
[ u"Û", "\\^U" ],
[ u"Ŷ", "\\^Y" ],
[ u"ä", "\\\"a" ], # Umlaut or dieresis
[ u"ë", "\\\"e" ],
[ u"ï", "\\\"\\i" ],
[ u"ö", "\\\"o" ],
[ u"ü", "\\\"u" ],
[ u"ÿ", "\\\"y" ],
[ u"Ä", "\\\"A" ],
[ u"Ë", "\\\"E" ],
[ u"Ï", "\\\"\\I" ],
[ u"Ö", "\\\"O" ],
[ u"Ü", "\\\"U" ],
[ u"Ÿ", "\\\"Y" ],
[ u"ç", "\\c{c}" ], # Cedilla
[ u"Ç", "\\c{C}" ],
[ u"œ", "{\\oe}" ], # Ligatures
[ u"Œ", "{\\OE}" ],
[ u"æ", "{\\ae}" ],
[ u"Æ", "{\\AE}" ],
[ u"å", "{\\aa}" ],
[ u"Å", "{\\AA}" ],
[ u"–", "--" ], # Dashes
[ u"—", "---" ],
[ u"ø", "{\\o}" ], # Misc latin-1 letters
[ u"Ø", "{\\O}" ],
[ u"ß", "{\\ss}" ],
[ u"¡", "{!`}" ],
[ u"¿", "{?`}" ],
[ u"\\", "\\\\" ], # Characters that should be quoted
[ u"~", "\\~" ],
[ u"&", "\\&" ],
[ u"$", "\\$" ],
[ u"{", "\\{" ],
[ u"}", "\\}" ],
[ u"%", "\\%" ],
[ u"#", "\\#" ],
[ u"_", "\\_" ],
[ u"≥", "$\\ge$" ], # Math operators
[ u"≤", "$\\le$" ],
[ u"≠", "$\\neq$" ],
[ u"©", "\copyright" ], # Misc
[ u"ı", "{\\i}" ],
[ u"µ", "$\\mu$" ],
[ u"°", "$\\deg$" ],
[ u"‘", "`" ], #Quotes
[ u"’", "'" ],
[ u"“", "``" ],
[ u"”", "''" ],
[ u"‚", "," ],
[ u"„", ",," ],
]
If you are not in control of LaTeX compilation options, you can use the same table used by the inputenc package, so that the behavior will be the same as if you had used inputenc.
This document explains how inputenc does the mapping, it is a sequence of
...
194 hall; t1; ly1i\DeclareUnicodeCharacter{00C2}{\^A}
195 hall; t1; ly1i\DeclareUnicodeCharacter{00C3}{\~A}
196 hall; t1; ly1i\DeclareUnicodeCharacter{00C4}{\"A}
197 hall; t1; ot1; ly1i\DeclareUnicodeCharacter{00C5}{\r A}
198 hall; t1; ot1; ly1; lcyi\DeclareUnicodeCharacter{00C6}{\AE}
199 hall; t1; ly1i\DeclareUnicodeCharacter{00C7}{\c C}
200 hall; t1; ly1i\DeclareUnicodeCharacter{00C8}{\@tabacckludge`E}
You could parse the file looking for all the DeclareUnicodeCharacter
lines and extract with a regexp the mapping.
EDIT: I've written some code that does the trick:
# -*- coding: utf-8 -*-
import re
translation_table = {}
for line in open('utf8ienc.dtx'):
m = re.match(r'%.*\DeclareUnicodeCharacter\{(\w+)\}\{(.*)\}', line)
if m:
codepoint, latex = m.groups()
latex = latex.replace('@tabacckludge', '') # remove useless (??) '@tabacckludge'
translation_table[int(codepoint, 16)] = unicode(latex)
print u"été à l'eau".translate(translation_table)
# outputs "\'et\'e \`a l'eau"
You should find utf8ienc.dtx
in your latex installation, or you can google it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With