Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTML to Markdown with Java

Tags:

java

markdown

is there an easy way to transform HTML into markdown with JAVA?

I am currently using the Java MarkdownJ library to transform markdown to html.

import com.petebevin.markdown.MarkdownProcessor; ... public static String getHTML(String markdown) {     MarkdownProcessor markdown_processor = new MarkdownProcessor();     return markdown_processor.markdown(markdown); }  public static String getMarkdown(String html) { /* TODO Ask stackoverflow */ } 
like image 818
Sergio del Amo Avatar asked Sep 12 '08 17:09

Sergio del Amo


People also ask

Can I convert HTML to markdown?

World's simplest online HTML to Markdown transformer for web developers and programmers. Just paste your HTML in the form below, press the Convert to Markdown button, and you'll get Markdown. Press a button – get Markdown.

What is Markdown Java?

Markdown is a plain text formatting syntax designed so that it optionally can be converted to HTML using a tool by the same name. Markdown is popularly used to format readme files, for writing messages in online discussion forums or in text editors for the quick creation of rich text documents.


2 Answers

There is a great library for JS called Turndown, you can try it online here. It works for htmls that the accepted answer errors out.

I needed it for Java (as the question), so I ported it. The library for Java is called CopyDown, it has the same test suite as Turndown and I've tried it with real examples that the accepted answer was throwing errors.

To install with gradle:

dependencies {         compile 'io.github.furstenheim:copy_down:1.0' } 

Then to use it:

CopyDown converter = new CopyDown(); String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>"; String markdown = converter.convert(myHtml); System.out.println(markdown); > Some title\n==========\n\nSome html\n\nAnother paragraph\n 

PS. It has MIT license

like image 116
Gabriel Furstenheim Avatar answered Sep 28 '22 04:09

Gabriel Furstenheim


I am working on the same issue, and experimenting with a couple different techniques.

The answer above could work. You could use the jTidy library to do the initial cleanup work and convert from HTML to XHTML. You use the XSLT stylesheet linked above.

Unfortunately there is no library that has a one-stop function to do this in Java. You could try using the Python script html2text with Jython, but I haven't yet tried this!

like image 38
myabc Avatar answered Sep 28 '22 03:09

myabc