Is there a library (for java) that compares similarity between web pages (HTML, dom similarity)?
In my application I want to classify links of a website.
For example:
group 1: Product detail page
group 2: Category page
(for online shopping sites, etc.).
For such a classification html structure (dom) similarity is the best way I think. Please help regarding this.
Not exactly what you ask but if the HTMl is XML valid you can use XMLUnit, it's very simple to compare similarity with it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With