Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is possible parse the DOCTYPE with Jsoup to discover HTML version?

I want to parse the DOCTYPE of a page with Jsoup to discover the version of HTML (HTML 5, HTML 4, XHTML, etc.).

Is possible to parse the DOCTYPE with Jsoup to handle it? If not is, there is a way to achieve the main objective that is discovering the version of page HTML?

like image 698
Renato Dinhani Avatar asked Apr 11 '12 14:04

Renato Dinhani


1 Answers

Jsoup has DocumentType class for this purposes:

List<Node>nods = doc.childNodes();
         for (Node node : nods) {
            if (node instanceof DocumentType) {
                DocumentType documentType = (DocumentType)node;
                  System.out.println(documentType.toString());
                  System.out.println(DocumentType.attr("publicid"));
            }
        }
like image 69
vacuum Avatar answered Oct 17 '22 00:10

vacuum