The abundance of HTML parsers to choose from (and stick with) is mind boggling:
http://java-source.net/open-source/html-parsers
How do I choose one that best suits the following requirements:
Based on your experience, which HTML parser would you recommend (for meeting the above requirements) and why?
Deprecated. As of release v1. 14.1 , this class is deprecated in favour of Safelist .
jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.
Jsoup is a Java html parser. It is a Java library that is used to parse html documents. Jsoup gives programming interface to concentrate and control information from URL or HTML documents. It utilizes DOM, CSS and Jquery-like systems for concentrating and controlling records.
You can extract data by using CSS selectors, or by navigating and modifying the Document Object Model directly - just like a browser does, except you do it in Java code. You can also modify and write HTML out safely too. jsoup will not run JavaScript for you - if you need that in your app I'd recommend looking at JCEF.
Well, I found the answer, which was given by @BalusC on a different thread:
Thank you @BalusC.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With