Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java. Compare webpages structure (dom) similarity.

Is there a library (for java) that compares similarity between web pages (HTML, dom similarity)?

In my application I want to classify links of a website. For example: group 1: Product detail page group 2: Category page (for online shopping sites, etc.).

For such a classification html structure (dom) similarity is the best way I think. Please help regarding this.

like image 548
cuneytykaya Avatar asked Jan 17 '12 09:01

cuneytykaya


1 Answers

Not exactly what you ask but if the HTMl is XML valid you can use XMLUnit, it's very simple to compare similarity with it.

like image 183
Víctor Romero Avatar answered Oct 16 '22 15:10

Víctor Romero