Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most Pythonic XHTML/HTML parser/generator/template module that supports DOM like access?

It should be able to create, modify and read X/HTML in a highly object oriented way that still feels DOM like but is not obese, and is really Pythonic. Preferably it would deal with malformed HTML too, but we can skip this for templates.

For example, I'd like to do this:

>> from someAmazingTemplate import *
>> html = Template('<html><head><title>Hi</title></head><body></body></html>')
>> html.head.append('<link type="text/css" href="main.css" rel="stylesheet" />')
>> html.head.title
Hi
>> html['head']['title']
Hi

I should be able to use/define short functions and use them like this:

>> html.head.append(stylesheet(href="main.css"))
>> html.body.append(h1('BIG TITLE!12',Class="roflol"))
>> html.body.SOURCE
<body>
    <h1 class="roflol">
        BIG TITLE!12
    </h1>
</body>

Note: If it doesn't exist, I'm going to make it under BSD/MIT/Python license. Help is most welcome. Anything that works towards more Pythonic web app development will be great. Very much appreciate it!

-Luke Stanley

like image 701
Luke Stanley Avatar asked Nov 16 '09 23:11

Luke Stanley


People also ask

What is the HTML parser?

HTML parsing involves tokenization and tree construction. HTML tokens include start and end tags, as well as attribute names and values. If the document is well-formed, parsing it is straightforward and faster. The parser parses tokenized input into the document, building up the document tree.

Which parser creates valid html5 pages in Python?

html5lib: A pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.


2 Answers

The first part can for the most part be done by ElementTree, but it takes a few more steps:

>>> import xml.etree.ElementTree as ET
>>> html = ET.XML('<html><head><title>Hi</title></head><body></body></html>')
>>> html.head = html.find('head')
>>> html.head.append(ET.XML('<link type="text/css" href="main.css" rel="stylesheet" />'))
>>> html.head.title = html.head.find('title')
>>> html.head.title.text
'Hi'

The second part can be completed by creating Element objects, but you'd need to do some of your own work to make it happen the way you really want:

>>> html.body = html.find('body')
>>> my_h1 = ET.Element('h1', {'class': 'roflol'})
>>> my_h1.text = 'BIG TITLE!12'
>>> html.body.append(my_h1)
>>> html.body.SOURCE = ET.tostring(html.body)
>>> html.body.SOURCE
'<body><h1 class="roflol">BIG TITLE!12</h1></body>'

You could create a stylesheet function of your own:

>>> def stylesheet(href='', type='text/css', rel='stylesheet', **kwargs):
...     elem = ET.Element('link', href=href, type=type, rel=rel) 
...     return elem
... 
>>> html.head.append(stylesheet(href="main.css"))

And the whole document:

>>> ET.tostring(html)
<html><head><title>Hi</title><link href="main.css" rel="stylesheet" type="text/css" /></head><body><h1 class="roflol">BIG TITLE!12</h1></body></html>

But, I think if you're going to end up writing your own thing, this is a good place to start. ElementTree is very powerful.

Edit: I realize that this is probably not exactly what you're looking for. I just wanted to provide something as an available alternative and to also prove that it could actually be done without too much work.

like image 122
jathanism Avatar answered Nov 08 '22 15:11

jathanism


Amara Bindery provides the most Pythonic XML API I've seen. See the quick reference, manual and faq

like image 28
quark Avatar answered Nov 08 '22 15:11

quark