Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search for comments ("<!-- -->") using Jsoup?

Tags:

java

jsoup

I would like to remove those tags with their content from source HTML.

like image 888
87element Avatar asked Sep 24 '11 20:09

87element


2 Answers

When searching you basically use Elements.select(selector) where selector is defined by this API. However comments are not elements technically, so you may be confused here, still they are nodes identified by the node name #comment.

Let's see how that might work:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;

public class RemoveComments {
    public static void main(String... args) {
        String h = "<html><head></head><body>" +
          "<div><!-- foo --><p>bar<!-- baz --></div><!--qux--></body></html>";
        Document doc = Jsoup.parse(h);
        removeComments(doc);
        doc.html(System.out);
    }

    private static void removeComments(Node node) {
        for (int i = 0; i < node.childNodeSize();) {
            Node child = node.childNode(i);
            if (child.nodeName().equals("#comment"))
                child.remove();
            else {
                removeComments(child);
                i++;
            }
        }
    }        
}
like image 70
dlamblin Avatar answered Sep 28 '22 11:09

dlamblin


With JSoup 1.11+ (possibly older version) you can apply a filter:

private void removeComments(Element article) {
    article.filter(new NodeFilter() {
        @Override
        public FilterResult tail(Node node, int depth) {
            if (node instanceof Comment) {
                return FilterResult.REMOVE;
            }
            return FilterResult.CONTINUE;
        }

        @Override
        public FilterResult head(Node node, int depth) {
            if (node instanceof Comment) {
                return FilterResult.REMOVE;
            }
            return FilterResult.CONTINUE;
        }
    });
}
like image 36
Michael Conrad Avatar answered Sep 28 '22 10:09

Michael Conrad