Sorry if this is too simple, but I simply couldn't find a tutorial nor the documentation of the Java version of TagSoup.
Basically I want to download an HTML webpage from the internet and turn it into XHTML, contained in a string. How can I do this with TagSoup?
Thanks!
Something like this:
wget -O - example.com/bad.html | java -jar tagsoup.jar
Or, from Java:
To parse HTML:
- Create an instance of
org.ccil.cowan.tagsoup.Parser
- Provide your own SAX2 ContentHandler
- Provide an
InputSource
referring to the HTML- And
parse()
!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With