Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extraction of HTML Tags using Java

Tags:

java

html

I wanted to extract the various HTML tags available from the source code of a web page is there any method in Java to do that or do HTML parser support this?

I want to seperate all the HTML tags .

like image 909
harshini Avatar asked Apr 21 '26 03:04

harshini


1 Answers

Java comes with an XML parser with similar methods to the DOM in JavaScript:

DocumentBuilder builder = DocumentBuilderFactory.newDocumentBuilder();
Document doc = builder.parse(html);
doc.getElementById("someId");
doc.getElementsByTagName("div");
doc.getChildNodes();

The document builder can take many different inputs (input stream, raw html string, etc).

http://download.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Document.html

The cyber neko parser is also good if you need more.

like image 165
Adam Ayres Avatar answered Apr 23 '26 16:04

Adam Ayres



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!