I wanted to extract the various HTML tags available from the source code of a web page is there any method in Java to do that or do HTML parser support this?
I want to seperate all the HTML tags .
Java comes with an XML parser with similar methods to the DOM in JavaScript:
DocumentBuilder builder = DocumentBuilderFactory.newDocumentBuilder();
Document doc = builder.parse(html);
doc.getElementById("someId");
doc.getElementsByTagName("div");
doc.getChildNodes();
The document builder can take many different inputs (input stream, raw html string, etc).
http://download.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Document.html
The cyber neko parser is also good if you need more.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With