Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to programmatically measure the elements' sizes in HTML source code using python?

Tags:

python

html

dom

I'm doing webpage layout analysis in python. A fundamental task is to programmatically measure the elements' sizes given HTML source codes, so that we could obtain statistical data of content/ad ratio, ad block position, ad block size for the webpage corpus.

An obvious approach is to use the width/height attributes, but they're not always available. Besides, things like width: 50% needs to be calculated after loading into DOM. So I guess loading the HTML source code into a window-size-predefined-browser (like mechanize although I'm not sure if window's size could be set) is a good way to try, but mechanize doesn't support the return of an element size anyway.

Is there any universal way (without width/height attributes) to do it in python, preferably with some library?

Thanks!

like image 676
shuaiyuancn Avatar asked Mar 27 '13 16:03

shuaiyuancn


1 Answers

I suggest You to take a look at Ghost - webkit web client written in python. It has JavaScript support so you can easily call JavaScript functions and get its return value. Example shows how to find out google text box width:

>>> from ghost import Ghost
>>> ghost = Ghost()
>>> ghost.open('https://google.lt')
>>> width, resources = ghost.evaluate("document.getElementById('gbqfq').offsetWidth;")
>>> width
541.0  # google text box width 541px
like image 150
Zygimantas Gatelis Avatar answered Oct 20 '22 20:10

Zygimantas Gatelis