Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A library in Java that can transform an HTML text into plain text?

The problem is simple, I want to transform a HTML text to plain text, thinks like putting line-breaks where is the <br> or title tags, number or markers on lists, etc.

I'm using BoilerPipe at the moment to do this, but this is not the main target of this library. There is another one that can do this?

like image 534
Renato Dinhani Avatar asked Feb 15 '26 17:02

Renato Dinhani


1 Answers

I really like the java library for selenium. Use getBodyText() to get the plain body text with the html tags stripped out and properly formatted.

see...

Selenium java API

like image 138
Adithya Surampudi Avatar answered Feb 18 '26 06:02

Adithya Surampudi