Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically analyze CSS layout

I would like to spider a few blogs and programmatically analyze their html and css-based layouts to see e.g. if the sidebar is to the left or right of the main content, how many columns and how wide they are.

How would I do this the best way? Are there any tools or libraries I can use?

(I would prefer a solution in Python or PHP.)

like image 935
Christian Davén Avatar asked Feb 16 '11 10:02

Christian Davén


1 Answers

This sounds like an extremely hard task to do using pure server-side CSS and HTML parsing - you would effectively have to recreate the browser's rendering engine to get reliable results.

Depending on what you need this for, I could think of a way somewhere along these lines:

  • Fetch pages and style sheets using something like wget with --page-requisites

  • Then either:

    • Walk through each downloaded page using a tool like Selenium, search for element names and output their positions (if that is possible in Selenium. I assume it is, but I do not know for sure)

    • Create a piece of jQuery that you inject into each of the downloaded pages. The jQuery searches for elements named "sidebar", "toolbar" etc., gets their positions, saves the results to a local AJAX snippet, and continues to the next downloaded page. You need to only open the first page in the browser, the rest will happen automatically. Not trivial to implement but possible.

If you can use a client side application platform like .NET, you may be easier off building a custom application that incorporates a browser control, whose DOM you can access more freely than using only jQuery.

like image 141
Pekka Avatar answered Nov 10 '22 00:11

Pekka