Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse and modify HTML file in Java

I am doing a project wherein I need to read an HTML file and identify specific tags, modify the contents of the tag, and create a new HTML file. Is there a library that parses HTML tags and is capable of writing the tags back to a new file?

like image 383
chai Avatar asked Oct 11 '10 13:10

chai


People also ask

How do you parse an HTML response in Java?

jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.

Can we parse HTML?

Which means that you can parse HTML documents after they have been modified by JavaScript. Both the JavaScript included in the page or a script you add yourself. The following example, from the documentation, shows a few features of AngleSharp. Console.

How do you parse an element in HTML?

If you just want to parse HTML and your HTML is intended for the body of your document, you could do the following : (1) var div=document. createElement("DIV"); (2) div. innerHTML = markup; (3) result = div. childNodes; --- This gives you a collection of childnodes and should work not just in IE8 but even in IE6-7.


3 Answers

Check out http://jsoup.org, it has a friendly dom-like API, for simple tasks you don't need to parse the html.

like image 60
Victor Ionescu Avatar answered Sep 19 '22 23:09

Victor Ionescu



if you want to modify web page and return modified content, I thnk the best way is to use XSL transformation.
http://en.wikipedia.org/wiki/XSLT

like image 29
Igor Konoplyanko Avatar answered Sep 19 '22 23:09

Igor Konoplyanko


There are too many HTML parsers. You could use JTidy, NekoHTML or check TagSoup.

I usually prefer parsing XHTML with the standard Java XML Parsers, but you can't do this for any type of HTML.

like image 28
ivy Avatar answered Sep 19 '22 23:09

ivy