I'm a pandoc newbie, so I must be missing something obvious. I'm trying to convert MS Word generated HTML file to markdown. Here is a test html:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
<div class="Section1">
<p class="Question"><span style="FONT-SIZE: 10pt">Today</span> <span style=
"FONT-SIZE: 10pt">is</span> <span lang="HR" style=
"FONT-SIZE: 10pt; mso-ansi-language: HR">a</span><span style=
"FONT-SIZE: 10pt">nice</span> <span style="FONT-SIZE: 10pt">day</span>
</p>
</div>
</body>
</html>
and I try to convert it with:
pandoc -f html -t markdown test.html -o test.md
I was expecting "Today is a nice day", but got:
<div class="Section1">
<span style="FONT-SIZE: 10pt">Today</span> <span
style="FONT-SIZE: 10pt">is</span> <span lang="HR"
style="FONT-SIZE: 10pt; mso-ansi-language: HR">a</span><span
style="FONT-SIZE: 10pt">nice</span> <span
style="FONT-SIZE: 10pt">day</span>
</div>
Why was the div kept? Why were the spans kept?
Pandoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx.
You can use the program pandoc on the SCF Linux and Mac machines (via the terminal window) to convert from formats such as HTML, LaTeX and Markdown to formats such as HTML, LaTeX, Word, OpenOffice, and PDF, among others.
div in HTML. Span and div are both generic HTML elements that group together related parts of a web page. However, they serve different functions. A div element is used for block-level organization and styling of page elements, whereas a span element is used for inline organization and styling.
You need to turn off some extensions. Either on the HTML input side:
$ pandoc -f html-native_divs-native_spans -t markdown test.html -o test.md
Or on the markdown output side:
$ pandoc -f html -t markdown-raw_html-native_divs-native_spans-fenced_divs-bracketed_spans test.html -o test.md
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With