Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A library in Java that can transform an HTML text into plain text?

The problem is simple, I want to transform a HTML text to plain text, thinks like putting line-breaks where is the <br> or title tags, number or markers on lists, etc.

I'm using BoilerPipe at the moment to do this, but this is not the main target of this library. There is another one that can do this?

like image 534
Renato Dinhani Avatar asked Feb 15 '26 17:02

Renato Dinhani


1 Answers

I really like the java library for selenium. Use getBodyText() to get the plain body text with the html tags stripped out and properly formatted.

see...

Selenium java API

like image 138
Adithya Surampudi Avatar answered Feb 18 '26 06:02

Adithya Surampudi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!