Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I apply inline css rules to pandoc html to markdown conversion?

I am trying to convert my Html Book(which is converted from PDF) to markdown format. When I have tried to convert HTML to markdown with the following code,pandoc does not apply HTML inline position rules such as(relative, absolute) to my markdown output.

pandoc -f html -t markdown input.html -o output.md

Is there any parameter for this functionality?

I have tried extract inline HTML to external CSS file with a program and add CSS parameter to my command but it didn't work.

pandoc -f html -t markdown --css=styles.css input.html -o output.md
like image 210
my-lord Avatar asked Sep 18 '18 13:09

my-lord


1 Answers

This is not possible with Pandoc or Markdown.

As the User Guide explains (emphasis added):

Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into pandoc’s simple document model. While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

Of course, HTML (and PDF) formats are "more expressive" than Markdown. Therefore, much of the formatting information is lost when using Pandoc to convert from those formats.

As a reminder, Markdown's documentation explains that (emphasis in original):

Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. ... HTML is a publishing format; Markdown is a writing format. Thus, Markdown’s formatting syntax only addresses issues that can be conveyed in plain text.

That being the case, Markdown has no use for or understanding of CSS. In fact, in Pandoc's User Guide, the --css flag is listed under Options affecting specific writers. In other words, it only applies to output formats which understand and can use it. Additionally, note that the option does not generate a CSS file, but rather points to one which was created externally by the user. In other words, it can be used when converting to HTML (or EPUB, etc) to point to a CSS file which defines formatting for that output. However, for output formats which do not understand CSS (including Markdown), the option is (presumably) ignored.

Now, if you are looking for a tool which extracts inline styles and exports them as a generated CSS file, such tools exist (Pandoc is not one of them). However, tool recommendations are off-topic here (and I don't have enough experience with any to make any recommendations anyway).

like image 68
Waylan Avatar answered Nov 01 '22 05:11

Waylan