Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the easiest way to convert an SO data dump from HTML back to Markdown?

I've just got my hands on a Stackoverflow data dump, and I'm disappointed to see that the Body field of the posts is in HTML rather than Markdown. I suspect there's Markdown in the original database because that's what I see if I try to edit an answer.

I want to recover Markdown from a large set of answers. I will be processing hundreds of entries in batch mode, using either command-line tools or some kind of Lua or C library, so an interactive tool like the wmd Markdown editor is not suitable. Can people say what tools are available to help me recover Markdown from a Stackoverflow data dump?


(Related question, not a duplicate: Convert HTML back to Markdown within wmd.)

like image 407
Norman Ramsey Avatar asked Aug 20 '09 17:08

Norman Ramsey


1 Answers

Markdownify converts HTML to Markdown.

See Also: MetaSO / Can Markdown be recovered from the SO data dump?

like image 86
Sampson Avatar answered Oct 24 '22 15:10

Sampson