Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup efficient way to remove html elements and children's

Tags:

java

html

jsoup

I want to remove html div and table tables tags and anything inside it(childs), what's the best way to do it ?

I tried traversing the document like this but it's not working, in Jsoup documentation it says that node.remove() removes the element from the DOM and his children's:

doc.traverse(new NodeVisitor() {
                @Override
                public void head(Node node, int i) {

                }

                @Override
                public void tail(Node node, int i) {
                    //Log.i(TAG,"node: "+node.nodeName());
                    if( node.nodeName().compareTo("table") == 0 ||
                            node.nodeName().compareTo("div") == 0 )
                       node.remove();

                }
            });
like image 563
Sergio Serra Avatar asked Nov 12 '13 13:11

Sergio Serra


1 Answers

Document doc = Jsoup.parse(html);
doc.select("table *").remove();
like image 147
hubs Avatar answered Oct 24 '22 13:10

hubs