I'm writing a series of Markdown documents on Github, in an effort to make the editing process forkable and mergeable. I'm intending to render, above all, to PDF format. I only need these typographical features:
I'm using pandoc
as my Markdown parser, initially trying the LaTeX/PDF output directly. However I asked a question on that and have given up - LaTeX is slow and awkward to use (imo), and seems to discourage class attributes on inline preformatted styles that would be useful for other formats (particularly HTML).
So, I'm now using Pandoc to convert to HTML and then wkhtmltopdf
to convert from HTML to PDF. This gets me 90% of the typographical features I'm looking for, and with minimal effort, so I think this is a good approach. However, on headings before paragraphs, and paragraphs before unordered lists, I'd like to have keep-with-next, or orphan control, but this does not appear to be supported. I've tried these CSS features:
li {
/* Try to avoid breaking inside a bullet, doesn't work for me */
break-inside: avoid-page;
}
ul {
/* Try to avoid breaking before a bullet list, doesn't work for me */
page-break-after: avoid;
}
p {
/* Not supported by Webkit: https://developer.mozilla.org/en-US/docs/Web/CSS/orphans */
orphans: 2;
}
As you can see from my code, orphans
sounds ideal, but it makes no difference to the PDF output, and the Mozilla reference says that Webkit (which is used internally by Wkhtmltopdf) doesn't support it.
What can I do to achieve this? I feel I am very close, but it's frustrating that it's a trivial problem that doesn't seem to have any obvious solutions. Whilst I've put some effort into learning Pandoc and Wkhtmltopdf, I am willing to drop either or both in favour of other F/OSS tools if they can be shown to do a better job.
I don't want to disappear down too many pointless rabbit-holes, but I see Pandoc can render to ODT. My master document is saved in ODT (using OpenOffice), and the formatting of this is perfect, including all the keep-with-next I want. Perhaps I could have a ODT document just to specify the styles, and then convert this alongside the Markdown documents. Is this worth trying?
The HTML output of Pandoc differentiates all markup correctly, so I wonder whether adding in a new HTML to PDF converter might do the trick. Dompdf sounds pretty good, so I'll give that a go too.
I will try raw LaTeX also at some point, using an editor like Lyx - I can't imagine LaTeX not having keep-with-next, and a GUI around it will soften the sharp edges! This is not ideal since LaTeX isn't as human-readable as Markdown, but I should think it is still merge-able in much the same way.
Using this article I'm trying to convert Markdown to ODT instead; however, it's still not perfect.
Using this approach I can include a "reference document" which contains pre-defined styles. Thus, orphan control and keep-with-next is now within reach — I just redefine the style in the reference document and it is correctly added to the output.
<ul>
.Thus, this approach offers one step forward and a couple of steps back!
I've switched to DOMPDF, and most of my page break control appears to be working! :=)
I've spotted a couple of minor buglets, but they have CSS workarounds. I'll carry on working on the document, but I suspect I'll end up going with this solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With