Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTML indenter written in Python

I am looking for a free (as in freedom) HTML indenter (or re-indenter) written in Python (module or command line). I don't need to filter HTML with a white list. I just want to indent (or re-indent) HTML source to make it more readable. For example, say I have the following code:

<ul><li>Item</li><li>Item
</li></ul>

the output could be something like:

<ul>
    <li>Item</li>
    <li>Item</li>
</ul>

Note: I am not looking for an interface to a non-Python software (for example Tidy, written in C), but a 100% Python script.

Thanks a lot.

like image 775
jep Avatar asked Jun 25 '11 21:06

jep


2 Answers

Here's my pure python solution:

from xml.dom.minidom import parseString as string_to_dom

def prettify(string, html=True):
    dom = string_to_dom(string)
    ugly = dom.toprettyxml(indent="  ")
    split = list(filter(lambda x: len(x.strip()), ugly.split('\n')))
    if html:
        split = split[1:]
    pretty = '\n'.join(split)
    return pretty

def pretty_print(html):
    print(prettify(html))

When used on your block of html:

html = """<ul><li>Item</li><li>Item</li></ul>"""
pretty_print(html)

I get:

<ul>
  <li>Item</li>
  <li>Item</li>
</ul>
like image 155
emehex Avatar answered Oct 14 '22 14:10

emehex


you can use the built-in module xml.dom.minidom's toprettyxml function:

>>> from xml.dom import minidom
>>> x = minidom.parseString("<ul><li>Item</li><li>Item\n</li></ul>")
>>> print x.toprettyxml()
<?xml version="1.0" ?>
<ul>
    <li>
        Item
    </li>
    <li>
        Item
    </li>
</ul>
like image 41
Elisha Avatar answered Oct 14 '22 14:10

Elisha