I'm trying to do XHTML DOM parsing with JTidy, and it seems to be rather counterintuitive task. In particular, there's a method to parse HTML:
Node Tidy.parse(Reader, Writer)
And to get the <body /> of that Node, I assume, I should use
Node Node.findBody(TagTable)
Where should I get an instance of that TagTable? (Constructor is protected, and I haven't found a factory to produce it.)
I use JTidy 8.0-SNAPSHOT.
I found there's much simpler method to extract the body:
tidy = new Tidy(); tidy.setXHTML(true); tidy.setPrintBodyOnly(true);
And then use tidy on the Reader-Writer pair.
Simple as it should be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With