Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does wkhtmltopdf offer any orphan or keep-with-next control?

I'm writing a series of Markdown documents on Github, in an effort to make the editing process forkable and mergeable. I'm intending to render, above all, to PDF format. I only need these typographical features:

  • Heading levels 1 and 2
  • Paragraph
  • Bulleted list and bullet item
  • Footer page number
  • Inline preformatted styles (primarily for inline code)
  • Block code

I'm using pandoc as my Markdown parser, initially trying the LaTeX/PDF output directly. However I asked a question on that and have given up - LaTeX is slow and awkward to use (imo), and seems to discourage class attributes on inline preformatted styles that would be useful for other formats (particularly HTML).

So, I'm now using Pandoc to convert to HTML and then wkhtmltopdf to convert from HTML to PDF. This gets me 90% of the typographical features I'm looking for, and with minimal effort, so I think this is a good approach. However, on headings before paragraphs, and paragraphs before unordered lists, I'd like to have keep-with-next, or orphan control, but this does not appear to be supported. I've tried these CSS features:

li {
    /* Try to avoid breaking inside a bullet, doesn't work for me */
    break-inside: avoid-page;
}

ul {
    /* Try to avoid breaking before a bullet list, doesn't work for me */
    page-break-after: avoid;
}

p {
    /* Not supported by Webkit: https://developer.mozilla.org/en-US/docs/Web/CSS/orphans */
    orphans: 2;
}

As you can see from my code, orphans sounds ideal, but it makes no difference to the PDF output, and the Mozilla reference says that Webkit (which is used internally by Wkhtmltopdf) doesn't support it.

What can I do to achieve this? I feel I am very close, but it's frustrating that it's a trivial problem that doesn't seem to have any obvious solutions. Whilst I've put some effort into learning Pandoc and Wkhtmltopdf, I am willing to drop either or both in favour of other F/OSS tools if they can be shown to do a better job.

Strategy 2

I don't want to disappear down too many pointless rabbit-holes, but I see Pandoc can render to ODT. My master document is saved in ODT (using OpenOffice), and the formatting of this is perfect, including all the keep-with-next I want. Perhaps I could have a ODT document just to specify the styles, and then convert this alongside the Markdown documents. Is this worth trying?

Strategy 3

The HTML output of Pandoc differentiates all markup correctly, so I wonder whether adding in a new HTML to PDF converter might do the trick. Dompdf sounds pretty good, so I'll give that a go too.

Strategy 4

I will try raw LaTeX also at some point, using an editor like Lyx - I can't imagine LaTeX not having keep-with-next, and a GUI around it will soften the sharp edges! This is not ideal since LaTeX isn't as human-readable as Markdown, but I should think it is still merge-able in much the same way.

like image 529
halfer Avatar asked Nov 03 '22 17:11

halfer


1 Answers

Attempt at strategy 2

Using this article I'm trying to convert Markdown to ODT instead; however, it's still not perfect.

Using this approach I can include a "reference document" which contains pre-defined styles. Thus, orphan control and keep-with-next is now within reach — I just redefine the style in the reference document and it is correctly added to the output.

  • However, unordered lists just have the "Text body" paragraph style, and so they cannot be differentiated as a block from paragraph text. When converting HTML to PDF, of course I can just create a style for <ul>.
  • I also have two (separate) inline pre-formatted styles (one for code and one for file names) but these are both rendered using the character style "Teletype". That means they cannot be differentiated in the final document.
  • My manual page breaks, which previously were in HTML, no longer work. There's no style they are attached to, so I think I do need to insert these manually.

Thus, this approach offers one step forward and a couple of steps back!

Attempt at strategy 3

I've switched to DOMPDF, and most of my page break control appears to be working! :=)

  • Don't break after a heading element
  • Manual page breaks are fine
  • Don't break inside an unordered list item
  • Don't break before an unordered list

I've spotted a couple of minor buglets, but they have CSS workarounds. I'll carry on working on the document, but I suspect I'll end up going with this solution.

like image 97
halfer Avatar answered Nov 09 '22 06:11

halfer