Using Gecko/Firefox or Webkit got HTML parsing in python

Question

I am using BeautifulSoup and urllib2 for downloading HTML pages and parsing them. Problem is with mis formed HTML pages. Though BeautifulSoup is good at handling mis formed HTML still its not as good as Firefox.

Considering that Firefox or Webkit are more updated and resilient at handling HTML I think its ideal to use them to construct and normalize DOM tree of a page and then manipulate it through Python.

However I cant find any python binding for the same. Can anyone suggest a way ?

I ran into some solutions of running a headless Firefox process and manipulating it through python but is there a more pythonic solution available.

vezult · Accepted Answer

Perhaps pywebkitgtk would do what you need.

Using Gecko/Firefox or Webkit got HTML parsing in python

Tags:

python

html

parsing

user90147

1 Answers

vezult

Recent Activity

Donate For Us

Using Gecko/Firefox or Webkit got HTML parsing in python

Tags:

python

html

parsing

user90147

1 Answers

vezult

Related questions

Recent Activity

Donate For Us