I would like to generate a graphical sitemap for my website. There are two stages, as far as I can tell:
Does anyone have advice or experience with achieving this, or know of existing work I can build on (ideally in Python)?
I came across some nice CSS for rendering the tree, but it only works for 3 levels.
Thanks
The only automatic way to create a sitemap is to know the structure of your site and write a program which builds on that knowledge. Just crawling the links won't usually work because links can be between any pages so you get a graph (i.e. connections between nodes). There is no way to convert a graph into a tree in the general case.
So you must identify the structure of your tree yourself and then crawl the relevant pages to get the titles of the pages.
As for "but it only works for 3 levels": Three levels is more than enough. If you try to create more levels, your sitemap will become unusable (too big, too wide). No one will want to download a 1MB sitemap and then scroll through 100'000 pages of links. If your site grows that big, then you must implement some kind of search.
Here is a python web crawler, which should make a good starting point. Your general strategy is this:
The reason you need to do all this is, as leonm noted, that websites are graphs, not trees, and laying out graphs is a harder problem than you can do in a simple piece of javascript and css. Graphviz is good at what it does.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With