so i'm writing this program that opens the page and one of the things that it should do is detect how many navigations (menus) web page has, how long is the main navigation (how many elements), average text in elements in navigation and so on...
anyway i have some problems detecting menus. i'm thinking there is 2 ways web navigation is coded:
1. <ul><li><a>Home</a><li><a>Products</a></li>...</ul>
2. <div><a>Home</a><a>Product</a>...</div>
so if i find this structure i know (or should i say "i think") its navigation. but this is NOT bulletproof. i get a lot of miss hits.
so does any1 have any better idea how to detect navigations on web pages?
There is no universal solution. You need to implement some heuristics. I will try such:
This way you will get the constant set of internal links which in most cases will be "menu" of the site.
In HTML4 and XHTML there is no standard way of writing menus. In HTML5 you have the <menu>
and <nav>
tags, but as you have concluded, in earlier versions the generally recommended way is to use an unordered list.
I would probably write a number of tests, and use them all in parallel to try and find the menu, e.g. based on position in the document, structure, and things like id
and class
attributes (the values of which will often contain "menu").
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With