I am looking for a simple lightweight java library that parses HTML. I have looked a lot and there are many options out there. But I cannot find something simple. I really would like to have something like pyquery in python except in java. My requirements are: fast, easy to use and lightweight.
What do I need it for? Not sure if this matters, but I need to index parts of an html documents. So I am hoping to be able to select part of that document quickly and then parse it.
I have used HTMLParser in the past. I wasn't very happy with it. I found tagsoup and jsoup. I really like jsoup. Haven't used it extensively yet but you can do something like:
Elements resultLinks = doc.select("h3 > a"); // direct a after h3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With