Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maintaining styling while cleaning html using JSoup

Tags:

java

html

jsoup

I am very new to JSoup. And, I am using the following code to clean the html:

    String clean = Jsoup.clean(html, Whitelist.relaxed());

I am getting the the required html but all the styling such as bold, italics etc. are missing.

How i can maintain the html code with its respective styling using JSoup? If there are any other library to do so, please recommend me one?

like image 446
Akash Rajbanshi Avatar asked Mar 28 '26 22:03

Akash Rajbanshi


1 Answers

When you use Jsoup.clean(html, Whitelist.relaxed()),

Whitelist.relaxed()

Directly from the documentation,

This whitelist allows a full range of text and structural body HTML: a, b, blockquote, br, caption, cite, code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul

if you want to make allow more elements (i.e. style atributes) add them to the Whitelist instance to allow through it. You can use the following methods from Whitelist API

addTags(java.lang.String...)
addAttributes(java.lang.String, java.lang.String...)

Please read the documentation of Whitelist from JSOUP library.

like image 77
Keerthivasan Avatar answered Apr 01 '26 09:04

Keerthivasan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!