How to parse and modify HTML file in Java

Tags:

I am doing a project wherein I need to read an HTML file and identify specific tags, modify the contents of the tag, and create a new HTML file. Is there a library that parses HTML tags and is capable of writing the tags back to a new file?

383

asked Oct 11 '10 13:10

chai

3 Answers

Check out http://jsoup.org, it has a friendly dom-like API, for simple tasks you don't need to parse the html.

answered Sep 19 '22 23:09

Victor Ionescu

if you want to modify web page and return modified content, I thnk the best way is to use XSL transformation.
http://en.wikipedia.org/wiki/XSLT

answered Sep 19 '22 23:09

Igor Konoplyanko

There are too many HTML parsers. You could use JTidy, NekoHTML or check TagSoup.

I usually prefer parsing XHTML with the standard Java XML Parsers, but you can't do this for any type of HTML.

answered Sep 19 '22 23:09

ivy

Related questions
                            
                                Why is a sequence named hibernate_sequence being created with JPA using Hibernate with the Oracle 10g dialect?
                            
                                Logging in multi-threaded application in java
                            
                                JPA : Many to Many query help needed
                            
                                Best java lib for http connections?
                            
                                GWT 2.1 CellTable Column Header click events
                            
                                Convert a Scala Buffer to Java ArrayList
                            
                                How to pass a text file as a argument?
                            
                                Java detect changes in filesystem
                            
                                Custom View not appearing
                            
                                How to bind ENUM to radiobutton?
                            
                                Something like unnecessary code detector for NetBeans
                            
                                How can I put a "(de)select all" check box in an SWT Table header?
                            
                                Eclipse RCP: Target platform - Eclipse vs. Equinox?
                            
                                Reading a zip file within a jar file
                            
                                jndi database connection with jpa and eclipselink
                            
                                webm / vp8 player for java
                            
                                Java |= operator question [duplicate]
                            
                                How to insert 'sub-rows' into a Wicket DataTable
                            
                                Include jar file when creating an R package
                            
                                How can I read file from classes directory in my WAR?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to parse and modify HTML file in Java

Tags:

java

html

html-parsing

chai

People also ask

3 Answers

Victor Ionescu

Igor Konoplyanko

ivy

Recent Activity

Donate For Us