I have some html documents for which I need to return the number of words in the document. This count should only include actual text (so no html tags e.g. html, br, etc).
Any ideas how to do this? Naturally, I would prefer to re-use some code.
Thanks,
Assaf
Strip out the HTML tags , get the text content , reuse Jsoup
Read file line by line , hold a Map<String, Integer> wordToCountMap
and read through and operate on the Map
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With