I've been trying to use Pandoc to convert some Markdown into a PDF file. This is a sample that Pandoc will not convert for me:
# Header! ## Sub Header themselves derived respectively from the Greek ἀναρχία i.e. 'anarchy'
That's just something I grabbed from the top of the wikipedia database dump. Pandoc doesn't like that at all. This is the error message it gives me:
pandoc: Error producing PDF from TeX source. ! Package inputenc Error: Unicode char \u8:ἀ not set up for use with LaTeX. See the inputenc package documentation for explanation. Type H <return> for immediate help. ... l.53 ...es derived respectively from the Greek ἀ
Is there a command switch I can give it to get around this? I tried following the advice to do something like this, but it failed:
iconv -t utf-8 test.md | pandoc -o test.pdf
Update Before following John's advice below, see this.
Update 2 This is the command that ultimately got it working. Hopefully this will help someone:
pandoc test2.md -o test2.pdf --latex-engine=xelatex --template=my.latex --variable mainfont="DejaVu Serif" --variable sansfont=Arial
And this is the contents of my.latex
:
\documentclass[$if(fontsize)$$fontsize$,$endif$$if(lang)$$lang$,$endif$$if(papersize)$$papersize$,$endif$]{$documentclass$} \usepackage[T1]{fontenc} \usepackage{lmodern} \usepackage{amssymb,amsmath} \usepackage{ifxetex,ifluatex} \usepackage{fixltx2e} % provides \textsubscript % use microtype if available \IfFileExists{microtype.sty}{\usepackage{microtype}}{} % use upquote if available, for straight quotes in verbatim environments \IfFileExists{upquote.sty}{\usepackage{upquote}}{} \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex \usepackage[utf]{inputenc} \usepackage{ucs} $if(euro)$ \usepackage{eurosym} $endif$ \else % if luatex or xelatex \usepackage{fontspec} \ifxetex \usepackage{xltxtra,xunicode} \fi \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase} \setromanfont{TeX Gyre Pagella} \newcommand{\euro}{€} $if(mainfont)$ \setmainfont{$mainfont$} $endif$ $if(sansfont)$ \setsansfont{$sansfont$} $endif$ $if(monofont)$ \setmonofont{$monofont$} $endif$ $if(mathfont)$ \setmathfont{$mathfont$} $endif$ \fi $if(geometry)$ \usepackage[$for(geometry)$$geometry$$sep$,$endfor$]{geometry} $endif$ $if(natbib)$ \usepackage{natbib} \bibliographystyle{plainnat} $endif$ $if(biblatex)$ \usepackage{biblatex} $if(biblio-files)$ \bibliography{$biblio-files$} $endif$ $endif$ $if(listings)$ \usepackage{listings} $endif$ $if(lhs)$ \lstnewenvironment{code}{\lstset{language=Haskell,basicstyle=\small\ttfamily}}{} $endif$ $if(highlighting-macros)$ $highlighting-macros$ $endif$ $if(verbatim-in-note)$ \usepackage{fancyvrb} $endif$ $if(tables)$ \usepackage{longtable} $endif$ $if(graphics)$ \usepackage{graphicx} % We will generate all images so they have a width \maxwidth. This means % that they will get their normal width if they fit onto the page, but % are scaled down if they would overflow the margins. \makeatletter \def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth \else\Gin@nat@width\fi} \makeatother \let\Oldincludegraphics\includegraphics \renewcommand{\includegraphics}[1]{\Oldincludegraphics[width=\maxwidth]{#1}} $endif$ \ifxetex \usepackage[setpagesize=false, % page size defined by xetex unicode=false, % unicode breaks when used with xetex xetex]{hyperref} \else \usepackage[unicode=true]{hyperref} \fi \hypersetup{breaklinks=true, bookmarks=true, pdfauthor={$author-meta$}, pdftitle={$title-meta$}, colorlinks=true, urlcolor=$if(urlcolor)$$urlcolor$$else$blue$endif$, linkcolor=$if(linkcolor)$$linkcolor$$else$magenta$endif$, pdfborder={0 0 0}} \urlstyle{same} % don't use monospace font for urls $if(links-as-notes)$ % Make links footnotes instead of hotlinks: \renewcommand{\href}[2]{#2\footnote{\url{#1}}} $endif$ $if(strikeout)$ \usepackage[normalem]{ulem} % avoid problems with \sout in headers with hyperref: \pdfstringdefDisableCommands{\renewcommand{\sout}{}} $endif$ \setlength{\parindent}{0pt} \setlength{\parskip}{6pt plus 2pt minus 1pt} \setlength{\emergencystretch}{3em} % prevent overfull lines $if(numbersections)$ $else$ \setcounter{secnumdepth}{0} $endif$ $if(verbatim-in-note)$ \VerbatimFootnotes % allows verbatim text in footnotes $endif$ $if(lang)$ \ifxetex \usepackage{polyglossia} \setmainlanguage{$mainlang$} \else \usepackage[$lang$]{babel} \fi $endif$ $for(header-includes)$ $header-includes$ $endfor$ $if(title)$ \title{$title$} $endif$ \author{$for(author)$$author$$sep$ \and $endfor$} \date{$date$} \begin{document} $if(title)$ \maketitle $endif$ $for(include-before)$ $include-before$ $endfor$ $if(toc)$ { \hypersetup{linkcolor=black} \setcounter{tocdepth}{$toc-depth$} \tableofcontents } $endif$ $body$ $if(natbib)$ $if(biblio-files)$ $if(biblio-title)$ $if(book-class)$ \renewcommand\bibname{$biblio-title$} $else$ \renewcommand\refname{$biblio-title$} $endif$ $endif$ \bibliography{$biblio-files$} $endif$ $endif$ $if(biblatex)$ \printbibliography$if(biblio-title)$[title=$biblio-title$]$endif$ $endif$ $for(include-after)$ $include-after$ $endfor$ \end{document}
Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. Pandoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx.
Pandoc includes a Haskell library and a standalone command-line program. The library includes separate modules for each input and output format, so adding a new input or output format just requires adding a new module. Pandoc is free software, released under the GPL. Copyright 2006–2022 John MacFarlane.
Yes, this means that pandoc can convert . docx files to . pdf and . html, but you may be thinking: “Word can export files to .
Commonly used markup languages include Markdown, ReStructuredText, HTML, LaTex, ePub, and Microsoft Word DOCX. In plain English, Pandoc allows you to convert a bunch of files from one markup language into another one. Typical examples include converting a Markdown file into a presentation, LaTeX, PDF, or even ePub.
Use the --pdf-engine=xelatex
option.
By default, Pandoc use the pdflatex
engine when converting markdown file to pdf files. pdflatex
can not handle Unicode characters very smoothly as xelatex
. You should try xelatex
instead. But, merely using xelatex
command is not enough. As is often the case, you need to choose a proper font which contains glyphs for the Unicode characters your want to typeset.
I am a Chinese user, so take Chinese for example. If you have a test.md
which contains the following content:
你好汉字
you can use the following command to compile this markdown file:
pandoc --pdf-engine=xelatex -V CJKmainfont="KaiTi" test.md -o test.pdf
In the above command, --pdf-engine=xelatex
is used to select the LaTeX engine (for the new version of Pandoc, --latex-engine
option is deprecated). -V CJKmainfont="KaiTi"
is used to select the proper font which support Chinese. For other languages, you may use the flag -C mainfont="<FONT_NAME>"
.
In order to find a font which supports your language, you need to know your language code. Then, if you are on Linux system or on Windows systems with TeX Live installed. You can use the following command to find a valid font for you language:
fc-list :lang=zh #find the font which support Chinese (language code is `zh`)
The output on my Linux system is shown below
If you choose to use, e.g. the font Source Han Serif CN
, then use the following command to compile your markdown file:
pandoc --pdf-engine=xelatex -V CJKmainfont="Source Han Serif CN" test.md -o test.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With