The problem is simple, I want to transform a HTML text to plain text, thinks like putting line-breaks where is the <br> or title tags, number or markers on lists, etc.
I'm using BoilerPipe at the moment to do this, but this is not the main target of this library. There is another one that can do this?
I really like the java library for selenium. Use getBodyText() to get the plain body text with the html tags stripped out and properly formatted.
see...
Selenium java API
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With