i am looking for a method to extract text from web page (initially html) using jdk or another library . please help
thanks
Use jsoup. This is currently the most elegant library for screen scraping.
URL url = new URL("http://example.com/");
Document doc = Jsoup.parse(url, 3*1000);
String title = doc.title();
I just love its CSS selector syntax.
Use a HTML parser if at all possible; there are many available for Java.
Or you can use regex like many people do. This is generally not advisable, however, unless you're doing very simplistic processing.
Text extraction:
Tag stripping:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With