Best Python Module for HTML parsing [closed]

Question

I have a website updater(people can update content(text) not the look of the website) which has HTML, javascript as the front end languages & python as the back-end/server side.

I am finding that updating HTML is very difficult from the front end because when I grab the updated HTML by ele.innerHTML or $(ele).html() gives altered HTML depending on the browser(DAMN IE).

So I have decided to update my HTML from the backend, ie, in Python

What do you think is the best python module to parse HTML & grab information?

My requirements are:
- that the module be atleast in Python 2.5 or less(because of my webhost)
- I will be parsing HTML & finding all the HTML elements that are of the class "updatable"
- For each element of the class "updatable": extract the innerText(not html only text/content)

Which python module would you suggest is best for this?
- HTMLParser.py
- htmllib.py
- know of any other python 2.5 compatible modules?

pcalcao · Accepted Answer

For parsing HTML I would suggest you take a look at Beautiful Soup. It's pretty powerful and can deal with some messed up markup as well.

http://www.crummy.com/software/BeautifulSoup/

Check this out and see if it helps you out! Hope it does.

David · Answer

I've been using lxml ( http://lxml.de/lxmlhtml.html ). It relatively fast for normal sized html documents and has support for using BeautifulSoup. As I understand it, BeautifulSoup is no longer supported so for all new projects I've used lxml.

Best Python Module for HTML parsing [closed]

Tags:

python

html

html-parsing

sazr

2 Answers

pcalcao

David

Recent Activity

Donate For Us

Best Python Module for HTML parsing [closed]

Tags:

python

html

html-parsing

sazr

2 Answers

pcalcao

David

Related questions

Recent Activity

Donate For Us