Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to uncomment html tags using jsoup

I wonder if it is possible to uncomment html tags using jsoup for instance change :

<!--<p> foo bar </p>-->

to

<p> foo bar </p>
like image 330
Jalal Sordo Avatar asked Dec 23 '13 16:12

Jalal Sordo


1 Answers

Yes it is possible. Here is one way to solve this:

  1. Find all comment nodes
  2. For each comment extract the data attribute
  3. Insert a new node with the data after the current comment node
  4. Delete the comment node

Have a look at this code:

 public class UncommentComments {
        public static void main(String... args) {
            String htmlIn = "<html><head></head><body>"
                    + "<!--<div> hello there </div>-->"
                    + "<div>not a comment</div>"
                    + "<!-- <h5>another comment</h5> -->" 
                    + "</body></html>";
            Document doc = Jsoup.parse(htmlIn);
            List<Comment> comments = findAllComments(doc);
            for (Comment comment : comments) {
                String data = comment.getData();
                comment.after(data);
                comment.remove();
            }
             System.out.println(doc.toString());
        }

        public static List<Comment> findAllComments(Document doc) {
            List<Comment> comments = new ArrayList<>();
            for (Element element : doc.getAllElements()) {
                for (Node n : element.childNodes()) {
                    if (n.nodeName().equals("#comment")){
                        comments.add((Comment)n);
                    }
                }
            }
            return Collections.unmodifiableList(comments);
        }
    }

Given this html document:

<html>
  <head></head>
  <body>
    <!--<div> hello there </div>-->
    <div>not a comment</div>
    <!-- <h5>another comment</h5> --> 
  </body>
</html>

Will result in this output:

<html>
  <head></head>
  <body>
    <div>hello there</div>
    <div>not a comment</div> 
    <h5>another comment</h5> 
  </body>
</html>
like image 100
Kai Sternad Avatar answered Sep 20 '22 23:09

Kai Sternad