Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandoc and foreign characters

I've been trying to use Pandoc to convert some Markdown into a PDF file. This is a sample that Pandoc will not convert for me:

# Header!  ## Sub Header  themselves derived respectively from the Greek ἀναρχία i.e. 'anarchy' 

That's just something I grabbed from the top of the wikipedia database dump. Pandoc doesn't like that at all. This is the error message it gives me:

pandoc: Error producing PDF from TeX source. ! Package inputenc Error: Unicode char \u8:ἀ not set up for use with LaTeX.  See the inputenc package documentation for explanation. Type  H <return>  for immediate help.  ...                                                l.53 ...es derived respectively from the Greek ἀ 

Is there a command switch I can give it to get around this? I tried following the advice to do something like this, but it failed:

iconv -t utf-8 test.md | pandoc -o test.pdf 

Update Before following John's advice below, see this.

Update 2 This is the command that ultimately got it working. Hopefully this will help someone:

pandoc test2.md -o test2.pdf --latex-engine=xelatex --template=my.latex --variable mainfont="DejaVu Serif" --variable sansfont=Arial

And this is the contents of my.latex:

\documentclass[$if(fontsize)$$fontsize$,$endif$$if(lang)$$lang$,$endif$$if(papersize)$$papersize$,$endif$]{$documentclass$} \usepackage[T1]{fontenc} \usepackage{lmodern} \usepackage{amssymb,amsmath} \usepackage{ifxetex,ifluatex} \usepackage{fixltx2e} % provides \textsubscript % use microtype if available \IfFileExists{microtype.sty}{\usepackage{microtype}}{} % use upquote if available, for straight quotes in verbatim environments \IfFileExists{upquote.sty}{\usepackage{upquote}}{} \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex   \usepackage[utf]{inputenc}   \usepackage{ucs} $if(euro)$   \usepackage{eurosym} $endif$ \else % if luatex or xelatex   \usepackage{fontspec}   \ifxetex     \usepackage{xltxtra,xunicode}   \fi   \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}   \setromanfont{TeX Gyre Pagella}   \newcommand{\euro}{€} $if(mainfont)$     \setmainfont{$mainfont$} $endif$ $if(sansfont)$     \setsansfont{$sansfont$} $endif$ $if(monofont)$     \setmonofont{$monofont$} $endif$ $if(mathfont)$     \setmathfont{$mathfont$} $endif$ \fi $if(geometry)$ \usepackage[$for(geometry)$$geometry$$sep$,$endfor$]{geometry} $endif$ $if(natbib)$ \usepackage{natbib} \bibliographystyle{plainnat} $endif$ $if(biblatex)$ \usepackage{biblatex} $if(biblio-files)$ \bibliography{$biblio-files$} $endif$ $endif$ $if(listings)$ \usepackage{listings} $endif$ $if(lhs)$ \lstnewenvironment{code}{\lstset{language=Haskell,basicstyle=\small\ttfamily}}{} $endif$ $if(highlighting-macros)$ $highlighting-macros$ $endif$ $if(verbatim-in-note)$ \usepackage{fancyvrb} $endif$ $if(tables)$ \usepackage{longtable} $endif$ $if(graphics)$ \usepackage{graphicx} % We will generate all images so they have a width \maxwidth. This means % that they will get their normal width if they fit onto the page, but % are scaled down if they would overflow the margins. \makeatletter \def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth \else\Gin@nat@width\fi} \makeatother \let\Oldincludegraphics\includegraphics \renewcommand{\includegraphics}[1]{\Oldincludegraphics[width=\maxwidth]{#1}} $endif$ \ifxetex   \usepackage[setpagesize=false, % page size defined by xetex               unicode=false, % unicode breaks when used with xetex               xetex]{hyperref} \else   \usepackage[unicode=true]{hyperref} \fi \hypersetup{breaklinks=true,             bookmarks=true,             pdfauthor={$author-meta$},             pdftitle={$title-meta$},             colorlinks=true,             urlcolor=$if(urlcolor)$$urlcolor$$else$blue$endif$,             linkcolor=$if(linkcolor)$$linkcolor$$else$magenta$endif$,             pdfborder={0 0 0}} \urlstyle{same}  % don't use monospace font for urls $if(links-as-notes)$ % Make links footnotes instead of hotlinks: \renewcommand{\href}[2]{#2\footnote{\url{#1}}} $endif$ $if(strikeout)$ \usepackage[normalem]{ulem} % avoid problems with \sout in headers with hyperref: \pdfstringdefDisableCommands{\renewcommand{\sout}{}} $endif$ \setlength{\parindent}{0pt} \setlength{\parskip}{6pt plus 2pt minus 1pt} \setlength{\emergencystretch}{3em}  % prevent overfull lines $if(numbersections)$ $else$ \setcounter{secnumdepth}{0} $endif$ $if(verbatim-in-note)$ \VerbatimFootnotes % allows verbatim text in footnotes $endif$ $if(lang)$ \ifxetex   \usepackage{polyglossia}   \setmainlanguage{$mainlang$} \else   \usepackage[$lang$]{babel} \fi $endif$ $for(header-includes)$ $header-includes$ $endfor$  $if(title)$ \title{$title$} $endif$ \author{$for(author)$$author$$sep$ \and $endfor$} \date{$date$}  \begin{document} $if(title)$ \maketitle $endif$  $for(include-before)$ $include-before$  $endfor$ $if(toc)$ { \hypersetup{linkcolor=black} \setcounter{tocdepth}{$toc-depth$} \tableofcontents } $endif$ $body$  $if(natbib)$ $if(biblio-files)$ $if(biblio-title)$ $if(book-class)$ \renewcommand\bibname{$biblio-title$} $else$ \renewcommand\refname{$biblio-title$} $endif$ $endif$ \bibliography{$biblio-files$}  $endif$ $endif$ $if(biblatex)$ \printbibliography$if(biblio-title)$[title=$biblio-title$]$endif$  $endif$ $for(include-after)$ $include-after$  $endfor$ \end{document} 
like image 491
Mike Thomsen Avatar asked Aug 12 '13 00:08

Mike Thomsen


People also ask

What is pandoc used for?

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. Pandoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx.

What are pandoc files?

Pandoc includes a Haskell library and a standalone command-line program. The library includes separate modules for each input and output format, so adding a new input or output format just requires adding a new module. Pandoc is free software, released under the GPL. Copyright 2006–2022 John MacFarlane.

Can pandoc convert PDF to HTML?

Yes, this means that pandoc can convert . docx files to . pdf and . html, but you may be thinking: “Word can export files to .

Can pandoc convert PDF to Markdown?

Commonly used markup languages include Markdown, ReStructuredText, HTML, LaTex, ePub, and Microsoft Word DOCX. In plain English, Pandoc allows you to convert a bunch of files from one markup language into another one. Typical examples include converting a Markdown file into a presentation, LaTeX, PDF, or even ePub.


2 Answers

Use the --pdf-engine=xelatex option.

like image 94
John MacFarlane Avatar answered Sep 20 '22 23:09

John MacFarlane


By default, Pandoc use the pdflatex engine when converting markdown file to pdf files. pdflatex can not handle Unicode characters very smoothly as xelatex. You should try xelatex instead. But, merely using xelatex command is not enough. As is often the case, you need to choose a proper font which contains glyphs for the Unicode characters your want to typeset.

I am a Chinese user, so take Chinese for example. If you have a test.md which contains the following content:

你好汉字

you can use the following command to compile this markdown file:

pandoc --pdf-engine=xelatex -V CJKmainfont="KaiTi" test.md -o test.pdf 

In the above command, --pdf-engine=xelatex is used to select the LaTeX engine (for the new version of Pandoc, --latex-engine option is deprecated). -V CJKmainfont="KaiTi" is used to select the proper font which support Chinese. For other languages, you may use the flag -C mainfont="<FONT_NAME>".

How to find a font which support your language

In order to find a font which supports your language, you need to know your language code. Then, if you are on Linux system or on Windows systems with TeX Live installed. You can use the following command to find a valid font for you language:

fc-list :lang=zh #find the font which support Chinese (language code is `zh`) 

The output on my Linux system is shown belowenter image description here

If you choose to use, e.g. the font Source Han Serif CN, then use the following command to compile your markdown file:

 pandoc --pdf-engine=xelatex -V CJKmainfont="Source Han Serif CN" test.md -o test.pdf 
like image 33
jdhao Avatar answered Sep 18 '22 23:09

jdhao