Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing css information from HTML in java

Tags:

java

html

parsing

Is there any library or pre-written code to remove css attributes from HTML code.

The requirement is, the Java code has to parse through the input html document, and remove the css attributes and produce the output html document.

For example if the input html document has this element,

      <p class="abc" style="xyz" > some text </p>

the output should be

      <p > some text </p>
like image 229
Vinoth Kumar C M Avatar asked Nov 18 '11 09:11

Vinoth Kumar C M


1 Answers

Use jsoup and NodeTraversor to remove class and style attributes from all elements

Document doc = Jsoup.parse(input);


NodeTraversor traversor  = new NodeTraversor(new NodeVisitor() {

  @Override
  public void tail(Node node, int depth) {
    if (node instanceof Element) {
        Element e = (Element) node;
        e.removeAttr("class");
        e.removeAttr("style");
    }
  }

  @Override
  public void head(Node node, int depth) {        
  }
});

traversor.traverse(doc.body());
String modifiedHtml = doc.toString();
like image 149
soulcheck Avatar answered Sep 30 '22 02:09

soulcheck