HTML/XML Parser for Java [closed]

2 Answers

Check out Web Harvest. It's both a library you can use and a data extraction tool, which sounds to me that's exactly what you want to do. You create XML script files to instruct the scraper how to extract the information you need and from where. The provided GUI is very useful to quickly test the scripts.

Check out the project's samples page to see if it's a good fit for what you are trying to do.

139

answered Oct 02 '22 07:10

Cesar

The best known are NekoHTML and JTidy.

NekoHTML is based on Xerces, and provides a simple adaptable SAXParser which implements XMLReader JavaSE interface.

JTidy is more intented into formatting your html code into something XML-valid, but is still very useful as an XML parser, producing a DOM tree if needed.

You could have a look at this list for other alternatives.

Another choice could be to use hpricot through jRuby.

answered Oct 02 '22 09:10

Valentin Rocher

Related questions
                            
                                Java Stack push() vs add()
                            
                                How to use `adb` to install development apps for one user only?
                            
                                CORS Filter not working as intended
                            
                                Package conflicts with automatic modules in Java 9
                            
                                Spring Cache with collection of items/entities
                            
                                How can I manually load a Java session using a JSESSIONID?
                            
                                Load environment-specific properties for use with PropertyPlaceholderConfigurer?
                            
                                Is it possible to use Java 8 Streams API for asynchronous processing?
                            
                                Generics compilation error with ternary operator in Java 8, but not in Java 7
                            
                                How to explicitly invoke default method from a dynamic Proxy?
                            
                                Scala and interfaces
                            
                                Best way to create a maven artifact from existing jar
                            
                                Entities equals(), hashCode() and toString(). How to correctly implement them?
                            
                                Android SQLite - what does SQLiteDatabase.replace() actually do?
                            
                                Unit under test: Impl or Interface?
                            
                                Should a database connection stay open all the time or only be opened when needed?
                            
                                Java uncaught global exception handler
                            
                                What are Value Types from Project Valhalla?
                            
                                Why is generic of a return type erased when there is an unchecked conversion of a method parameter in Java 8?
                            
                                What are the Java equivalents to Linq and Entity Framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

HTML/XML Parser for Java [closed]

Tags:

java

html

dom

parsing

xml

Shayan

People also ask

2 Answers

Cesar

Valentin Rocher

Recent Activity

Donate For Us