Extraction of HTML Tags using Java

Question

I wanted to extract the various HTML tags available from the source code of a web page is there any method in Java to do that or do HTML parser support this?

I want to seperate all the HTML tags .

Adam Ayres · Accepted Answer

Java comes with an XML parser with similar methods to the DOM in JavaScript:

DocumentBuilder builder = DocumentBuilderFactory.newDocumentBuilder();
Document doc = builder.parse(html);
doc.getElementById("someId");
doc.getElementsByTagName("div");
doc.getChildNodes();

The document builder can take many different inputs (input stream, raw html string, etc).

http://download.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Document.html

The cyber neko parser is also good if you need more.

Extraction of HTML Tags using Java

Tags:

java

html

harshini

1 Answers

Adam Ayres

Recent Activity

Donate For Us

Extraction of HTML Tags using Java

Tags:

java

html

harshini

1 Answers

Adam Ayres

Related questions

Recent Activity

Donate For Us