Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tables or images too wide in Pandoc output as DOCX or PDF/LaTeX

I am writing a quick and dirty report using pandoc and markdown.

I need to generate a PDF or a DOCX with minimum hassle, I don't care much about which (best would be both, of course). Also, I am somewhat constrained regarding the figures and tables -- they have been generated a priori with another program and I would rather be able to insert them as they are then to convert them to suit pandoc's needs.

However, the main constraint is that I don't want to edit the resulting document manually, be that LaTeX or DOCX. I want to do all editing in markdown.

Here is the problem:

  • In DOCX, the tables are displayed fine: they have the width of the document. However, the figures are much too wide. I can either convert the images to a lower resolution (which doesn't look nice), or manually resize the images in Word (which is out of question).
  • In PDF, the generated figures are fine (more or less), however another two problems appear:
    • The tables are too wide, because there are no line breaks, and
    • LaTeX being LaTeX, the order of figures and tables are "reorganized", that is, they are not consecutive.

Thus, none of the documents generated are usable for my purposes.

All I wanted to do is to slap together some results and generate a file that I can send to another scientist.

Question: what is the best solution to generate a quick and dirty report in pandoc with minimum effort and at least all results visible?

Update: Upgrading pandoc to 1.4 or later solves the issue -- the figures have now correct sizes in docx documents.

like image 708
January Avatar asked Apr 23 '15 10:04

January


People also ask

Can pandoc convert from PDF?

You can use the program pandoc on the SCF Linux and Mac machines (via the terminal window) to convert from formats such as HTML, LaTeX and Markdown to formats such as HTML, LaTeX, Word, OpenOffice, and PDF, among others.

Does pandoc need LaTeX?

By default, pandoc will use LaTeX to create the PDF, which requires that a LaTeX engine be installed (see --pdf-engine below). Alternatively, pandoc can use ConTeXt, roff ms, or HTML as an intermediate format.

How do I convert Word to markdown in pandoc?

As you can see, first, write pandoc to execute any pandoc command, and then write the file name of a file with extension which you want to convert to a markdown file. The -o stands for output file name and provide name of the file with extension to which it should be converted.


1 Answers

Control over image size

Currently you cannot control that feature directly from Markdown. For LaTeX/PDF output, this is automatically handled by LaTeX/pdflatex itself.

In recent months there have been some discussions going on in the Pandoc developer + user community about how to best implement it and create an easy-to-use syntax, for example

![Image Caption](./path/to/image.jpg "Image Comment"){width="60%", height="150px"}

(Warning: Example only, made up on the spot + extracted from thin air by myself -- can't remember the latest state of the discussion...) This is designed to then transfer to all the supported output formats which can contain images, not just to LaTeX/PDF.

So something along these lines is planned to be a major new feature for the next major release of Pandoc, and will start to be working better in ODT/DOCX output as well.

Control over table/cell widths and line breaks within cells

How exactly do you specify your tables in Markdown syntax?

Are you aware that Pandoc supports several variations like gid_tables, pipe_tables, simple_tables and multiline_tables?

You should look into using pandoc --from=markdown+multiline_tables ... as your command and write the critical tables as multiline_tables in your Markdown.

Read all about the details via man pandoc_markdown...

Multiline tables give you a limited control over the width of individual columns in the output, just by widening or narrowing the column widths in the markdown source itself.

Order of figures and tables when outputting LaTeX/PDF

Pandoc supports the insertion of raw_tex lines and environments into the Markdown source file. When it encounters such lines, it transmits them un-changed into its LaTeX output. (But it will be ignored for all other outputs.)

So you can insert lines like

\newpage{}

into the Markdown to enforce a page break. This already gives you some limited control over keeping the order of mis-behaving figures or tables. (After all, you said you look for a "quick and dirty" method, not a sophisticated typeset document...)

Of course, if you know LaTeX more and better, you can also use stuff like /FloatBarrier inside your Markdown.

Going down that road (mixing LaTeX code into Markdown) gives you a few disadvantages:

  1. The Markdown will not look as pretty any more.
  2. The Markdown will not work fully with other output formats (should you need them).

But the advantage still are:

  1. You will be writing and modifying the document text much faster in Markdown than authoring it in LaTeX.
  2. You have some additional control over the final look of your PDF:
    • order of tables + figures
    • look + width of tables + figures (because, you can of course insert a complete LaTeX 'figure' or 'table' environment).
like image 183
Kurt Pfeifle Avatar answered Sep 23 '22 13:09

Kurt Pfeifle