Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Is there a decent, customisable, HTML to Markdown Java API?

Tags:

java

html

markdown

I want to save text I scrape from various sources without the HTML tags that are on it, but also keeping as much of the structure as I reasonably can.

Markdown seems to be the solution to this (or possibly MultiMarkdown).

There is a question which offers a suggestion on converting from HTML to markdown, but I want to specify some specific things:

ALL links (including images) are referenced at the END only (i.e. no inline urls)
NO embeded HTML (I'm not even 100% sure yet how I'd like to deal with difficult HTML... but it won't be embeded!)

So my question is as stated in the title: Is there a decent, customisable, HTML to Markdown Java API?

like image

835

asked Nov 01 '10 17:11

barryred

People also ask

Can you convert html to markdown?

We can easily convert HTML to markdown using markdownify package.

1 Answers

You could try adapting HtmlCleaner which provides a workable interface onto the DOM:

TagNode root = htmlCleaner.clean( stream );
Object[] found = root.evaluateXPath( "//div[id='something']" );
if( found.length > 0 && found instanceof TagNode ) {
    ((TagNode)found[0]).removeFromTree();
}

This would allow you to structure your output stream in any format that you want using a fairly simple API.

like image

66

answered Oct 14 '22 13:10

Gary Rowe

Sign in to Comment

Related questions
                            
                                Checking capsLock status
                            
                                How to count number of objects stored in a *.ser file
                            
                                java httpurlconnection cutting off html
                            
                                How do I sign my ProGuard'ed Scala stand-alone JARs?
                            
                                WCF & Java Interop using WSHttpBinding,
                            
                                Port to Service Name in Java?
                            
                                Alternative to Java Applet for File System Access from Web
                            
                                How can I dynamically access property of Java object in GWT?
                            
                                Help with understanding jstack output
                            
                                Combining Aero Glass effects and SWT
                            
                                Can a Jersey GET request return a polymorphic entity?
                            
                                What's the best scalable modern architecture for a high volume website (Java)
                            
                                Android Blend Modes
                            
                                How to implement web services in java
                            
                                Jersey: Inject Spring component into ContainerRequestFilter
                            
                                Unix socket connection to MySql with Java to avoid JDBC's TCP/IP overhead?
                            
                                Which pentaho mondrian library to include in a Java application to have mapping MDX to SQL
                            
                                Java threading JavaDoc
                            
                                Javamail IMAP search by SUBJECT fails
                            
                                Multiprocessor programming: lock-free stacks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With