I need to process some HTML pages in my Android App and I would prefer to use XPath for extracting the relevant information. For regular J2SE there are a lot of possible implementations for parsing regular HTML into a org.w3c.dom.Document: <ul> <li>jTidy</li> <li>TagSoup</li> <li> Jericho </li> <li>NekoHTML</li> <li>HTMLCleaner</li> </ul> (List may be incomplete - it has been extracted from https://stackoverflow.com/questions/2009897/recommend-an-alternative-to-jtidy) But it is very complicated to estimate if and how good those libraries work on Android (library size, cpu and memory consumption). Based on your experience - what is the library of your choice for Android?

OK, looks like no-one can answer that question - then I have to check it myself. jTidy I downloaded the latest jTidy sources, compiled them and added the created jar file as library to my Android app. There were no problems using jTidy in my App (emulator and real phone). At runtime jTidy also works fine - but it seems that it is not a good fit for the limited Android environment - it works really slow. Looking at the Logcat output even parsing a ~10kb html file causes the garbage collector to work heavily. HTMLCleaner From my experience HTMLCleaner works also nice on Android; the library size is relatively small (106KB for v2.2). However the parsed DOM it creates is not as expected - HTMLCleaner inserts for example additional <code></code> elements into the DOM. This may be OK if you want to display it as an HTML file but for my use case - extrecting information via XPath expressions - this is a no-go! TagSoup Not tested Jericho Not tested NekoHTML Not tested JSoup Not tested

Which HTML DOM parser works best on Android?

Tags:

java

dom

android

parsing

screen-scraping

I need to process some HTML pages in my Android App and I would prefer to use XPath for extracting the relevant information. For regular J2SE there are a lot of possible implementations for parsing regular HTML into a org.w3c.dom.Document:

jTidy
TagSoup
Jericho
NekoHTML
HTMLCleaner

(List may be incomplete - it has been extracted from https://stackoverflow.com/questions/2009897/recommend-an-alternative-to-jtidy)

But it is very complicated to estimate if and how good those libraries work on Android (library size, cpu and memory consumption).

Based on your experience - what is the library of your choice for Android?

448

asked Sep 25 '11 14:09

Robert

1 Answers

OK, looks like no-one can answer that question - then I have to check it myself.

jTidy

I downloaded the latest jTidy sources, compiled them and added the created jar file as library to my Android app. There were no problems using jTidy in my App (emulator and real phone). At runtime jTidy also works fine - but it seems that it is not a good fit for the limited Android environment - it works really slow. Looking at the Logcat output even parsing a ~10kb html file causes the garbage collector to work heavily.

HTMLCleaner

From my experience HTMLCleaner works also nice on Android; the library size is relatively small (106KB for v2.2). However the parsed DOM it creates is not as expected - HTMLCleaner inserts for example additional  elements into the DOM. This may be OK if you want to display it as an HTML file but for my use case - extrecting information via XPath expressions - this is a no-go!

TagSoup

Not tested

Jericho

Not tested

NekoHTML

Not tested

JSoup

Not tested

101

answered Oct 12 '22 23:10

Robert

Related questions
                            
                                How to change/assign process name of java .jar
                            
                                Android - Memory leak when dynamically building UI with image resource backgrounds
                            
                                android/java getIdentifier with
                            
                                Passing parameters to PrimeFaces Star Rating component?
                            
                                Managing database transactions manually in a Spring/Hibernate environment
                            
                                How do you call JNI_CreateJavaVM without Valgrind errors?
                            
                                JAX-RS Encoding
                            
                                method to convert from a string to an int [duplicate]
                            
                                Is there any way of changing gmail password programmatically using java?
                            
                                How to log junit test run results to a database
                            
                                Fire Hibernate custom event listener before auto-triggered default listeners
                            
                                How to use multiple forms in one page with JSF 2.0?
                            
                                Documentation of Java Mail API configuration for JNDI in Tomcat
                            
                                Is Flash scope free of race conditions?
                            
                                Bidirectional JSON-RPC over TCP socket Java implementation
                            
                                jprofiler or other: how do I roll up recursive method calls?
                            
                                Android open browser from service avoiding multiple tabs
                            
                                Groovy Date Parsing -- X is an illegal pattern character
                            
                                JGit checkout vs `git checkout` problems
                            
                                How do I pass a file as argument to my Java application created using JAR Bundler?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which HTML DOM parser works best on Android?

Tags:

java

dom

android

parsing

screen-scraping

Robert

People also ask

1 Answers

Robert

Recent Activity

Donate For Us