Pandoc and foreign characters

Tags:

I've been trying to use Pandoc to convert some Markdown into a PDF file. This is a sample that Pandoc will not convert for me:

# Header!  ## Sub Header  themselves derived respectively from the Greek ἀναρχία i.e. 'anarchy'

That's just something I grabbed from the top of the wikipedia database dump. Pandoc doesn't like that at all. This is the error message it gives me:

pandoc: Error producing PDF from TeX source. ! Package inputenc Error: Unicode char \u8:ἀ not set up for use with LaTeX.  See the inputenc package documentation for explanation. Type  H <return>  for immediate help.  ...                                                l.53 ...es derived respectively from the Greek ἀ

Is there a command switch I can give it to get around this? I tried following the advice to do something like this, but it failed:

iconv -t utf-8 test.md | pandoc -o test.pdf

Update Before following John's advice below, see this.

Update 2 This is the command that ultimately got it working. Hopefully this will help someone:

pandoc test2.md -o test2.pdf --latex-engine=xelatex --template=my.latex --variable mainfont="DejaVu Serif" --variable sansfont=Arial

And this is the contents of my.latex:

\documentclass[$if(fontsize)$$fontsize$,$endif$$if(lang)$$lang$,$endif$$if(papersize)$$papersize$,$endif$]{$documentclass$} \usepackage[T1]{fontenc} \usepackage{lmodern} \usepackage{amssymb,amsmath} \usepackage{ifxetex,ifluatex} \usepackage{fixltx2e} % provides \textsubscript % use microtype if available \IfFileExists{microtype.sty}{\usepackage{microtype}}{} % use upquote if available, for straight quotes in verbatim environments \IfFileExists{upquote.sty}{\usepackage{upquote}}{} \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex   \usepackage[utf]{inputenc}   \usepackage{ucs} $if(euro)$   \usepackage{eurosym} $endif$ \else % if luatex or xelatex   \usepackage{fontspec}   \ifxetex     \usepackage{xltxtra,xunicode}   \fi   \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}   \setromanfont{TeX Gyre Pagella}   \newcommand{\euro}{€} $if(mainfont)$     \setmainfont{$mainfont$} $endif$ $if(sansfont)$     \setsansfont{$sansfont$} $endif$ $if(monofont)$     \setmonofont{$monofont$} $endif$ $if(mathfont)$     \setmathfont{$mathfont$} $endif$ \fi $if(geometry)$ \usepackage[$for(geometry)$$geometry$$sep$,$endfor$]{geometry} $endif$ $if(natbib)$ \usepackage{natbib} \bibliographystyle{plainnat} $endif$ $if(biblatex)$ \usepackage{biblatex} $if(biblio-files)$ \bibliography{$biblio-files$} $endif$ $endif$ $if(listings)$ \usepackage{listings} $endif$ $if(lhs)$ \lstnewenvironment{code}{\lstset{language=Haskell,basicstyle=\small\ttfamily}}{} $endif$ $if(highlighting-macros)$ $highlighting-macros$ $endif$ $if(verbatim-in-note)$ \usepackage{fancyvrb} $endif$ $if(tables)$ \usepackage{longtable} $endif$ $if(graphics)$ \usepackage{graphicx} % We will generate all images so they have a width \maxwidth. This means % that they will get their normal width if they fit onto the page, but % are scaled down if they would overflow the margins. \makeatletter \def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth \else\Gin@nat@width\fi} \makeatother \let\Oldincludegraphics\includegraphics \renewcommand{\includegraphics}[1]{\Oldincludegraphics[width=\maxwidth]{#1}} $endif$ \ifxetex   \usepackage[setpagesize=false, % page size defined by xetex               unicode=false, % unicode breaks when used with xetex               xetex]{hyperref} \else   \usepackage[unicode=true]{hyperref} \fi \hypersetup{breaklinks=true,             bookmarks=true,             pdfauthor={$author-meta$},             pdftitle={$title-meta$},             colorlinks=true,             urlcolor=$if(urlcolor)$$urlcolor$$else$blue$endif$,             linkcolor=$if(linkcolor)$$linkcolor$$else$magenta$endif$,             pdfborder={0 0 0}} \urlstyle{same}  % don't use monospace font for urls $if(links-as-notes)$ % Make links footnotes instead of hotlinks: \renewcommand{\href}[2]{#2\footnote{\url{#1}}} $endif$ $if(strikeout)$ \usepackage[normalem]{ulem} % avoid problems with \sout in headers with hyperref: \pdfstringdefDisableCommands{\renewcommand{\sout}{}} $endif$ \setlength{\parindent}{0pt} \setlength{\parskip}{6pt plus 2pt minus 1pt} \setlength{\emergencystretch}{3em}  % prevent overfull lines $if(numbersections)$ $else$ \setcounter{secnumdepth}{0} $endif$ $if(verbatim-in-note)$ \VerbatimFootnotes % allows verbatim text in footnotes $endif$ $if(lang)$ \ifxetex   \usepackage{polyglossia}   \setmainlanguage{$mainlang$} \else   \usepackage[$lang$]{babel} \fi $endif$ $for(header-includes)$ $header-includes$ $endfor$  $if(title)$ \title{$title$} $endif$ \author{$for(author)$$author$$sep$ \and $endfor$} \date{$date$}  \begin{document} $if(title)$ \maketitle $endif$  $for(include-before)$ $include-before$  $endfor$ $if(toc)$ { \hypersetup{linkcolor=black} \setcounter{tocdepth}{$toc-depth$} \tableofcontents } $endif$ $body$  $if(natbib)$ $if(biblio-files)$ $if(biblio-title)$ $if(book-class)$ \renewcommand\bibname{$biblio-title$} $else$ \renewcommand\refname{$biblio-title$} $endif$ $endif$ \bibliography{$biblio-files$}  $endif$ $endif$ $if(biblatex)$ \printbibliography$if(biblio-title)$[title=$biblio-title$]$endif$  $endif$ $for(include-after)$ $include-after$  $endfor$ \end{document}

491

asked Aug 12 '13 00:08

2 Answers

Use the --pdf-engine=xelatex option.

answered Sep 20 '22 23:09

By default, Pandoc use the pdflatex engine when converting markdown file to pdf files. pdflatex can not handle Unicode characters very smoothly as xelatex. You should try xelatex instead. But, merely using xelatex command is not enough. As is often the case, you need to choose a proper font which contains glyphs for the Unicode characters your want to typeset.

I am a Chinese user, so take Chinese for example. If you have a test.md which contains the following content:

你好汉字

you can use the following command to compile this markdown file:

pandoc --pdf-engine=xelatex -V CJKmainfont="KaiTi" test.md -o test.pdf

In the above command, --pdf-engine=xelatex is used to select the LaTeX engine (for the new version of Pandoc, --latex-engine option is deprecated). -V CJKmainfont="KaiTi" is used to select the proper font which support Chinese. For other languages, you may use the flag -C mainfont="<FONT_NAME>".

How to find a font which support your language

In order to find a font which supports your language, you need to know your language code. Then, if you are on Linux system or on Windows systems with TeX Live installed. You can use the following command to find a valid font for you language:

fc-list :lang=zh #find the font which support Chinese (language code is `zh`)

The output on my Linux system is shown below enter image description here

If you choose to use, e.g. the font Source Han Serif CN, then use the following command to compile your markdown file:

 pandoc --pdf-engine=xelatex -V CJKmainfont="Source Han Serif CN" test.md -o test.pdf

answered Sep 18 '22 23:09

jdhao

Related questions
                            
                                Rails: Convert HTML to PDF? [closed]
                            
                                overlay one pdf or ps file on top of another
                            
                                Remove the last page of a pdf file using PDFtk?
                            
                                Open Source HTML to PDF Renderer with Full CSS Support [closed]
                            
                                How do I know if PDF pages are color or black-and-white?
                            
                                How to install wkhtmltopdf on a linux based (shared hosting) web server
                            
                                Generating a PDF file from React Components
                            
                                How to extract text from the PDF document? [closed]
                            
                                CLI pdf viewer for linux [closed]
                            
                                C# 4.0: Convert pdf to byte[] and vice versa
                            
                                How to make annotation like highlighting, strikethrough, underline, draw, add text, etc in android for a pdf viewer?
                            
                                PDF specifications for coders: Adobe or ISO?
                            
                                Get the number of pages in a PDF document
                            
                                Ghostscript to merge PDFs compresses the result
                            
                                Convert Word doc and docx format to PDF in .NET Core without Microsoft.Office.Interop
                            
                                Opening PDF String in new window with javascript
                            
                                Asp.Net MVC how to get view to generate PDF
                            
                                Split a PDF in two
                            
                                How to execute ImageMagick to convert only the first page of the multipage PDF to JPEG?
                            
                                How do I use pdfminer as a library

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandoc and foreign characters

Tags:

markdown

pdf

pandoc

Mike Thomsen

People also ask

2 Answers

John MacFarlane

How to find a font which support your language

jdhao

Recent Activity

Donate For Us