Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandoc - HTML to Markdown - remove all attributes

This would seem like a simple thing to do, but I've been unable to find an answer. I'm converting from HTML to Markdown using Pandoc and I would like to strip all attributes from the HTML such as "class" and "id".

Is there an option in Pandoc to do this?

like image 777
trajan Avatar asked Feb 06 '17 14:02

trajan


1 Answers

Consider input.html:

<h1 class="test">Hi!</h1>
<p><strong id="another">This is a test.</strong></p>

Then, pandoc input.html -t markdown_github-raw_html -o output.md

produces output.md:

Hi!
===

**This is a test.**

without the -t markdown_github-raw_html, you would get

Hi! {#hi .test}
===

**This is a test.**

This question is actually similar to this one. I don't think pandoc ever preserves id attributes.

like image 87
Clément Avatar answered Sep 23 '22 11:09

Clément