Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom indent width for BeautifulSoup .prettify()

Is there any way to define custom indent width for .prettify() function? From what I can get from it's source -

def prettify(self, encoding=None, formatter="minimal"):     if encoding is None:         return self.decode(True, formatter=formatter)     else:         return self.encode(encoding, True, formatter=formatter) 

There is no way to specify indent width. I think it's because of this line in the decode_contents() function -

s.append(" " * (indent_level - 1)) 

Which has a fixed length of 1 space! (WHY!!) I tried specifying indent_level=4, that just results in this -

    <section>      <article>       <h1>       </h1>       <p>       </p>      </article>     </section> 

Which looks just plain stupid. :|

Now, I can hack this away, but I just want to be sure if there is anything I'm missing. Because this should be a basic feature. :-/

If you have some better way of prettifying HTML codes, let me know.

like image 711
Bibhas Debnath Avatar asked Mar 19 '13 20:03

Bibhas Debnath


1 Answers

I actually dealt with this myself, in the hackiest way possible: by post-processing the result.

r = re.compile(r'^(\s*)', re.MULTILINE) def prettify_2space(s, encoding=None, formatter="minimal"):     return r.sub(r'\1\1', s.prettify(encoding, formatter)) 

Actually, I monkeypatched prettify_2space in place of prettify in the class. That's not essential to the solution, but let's do it anyway, and make the indent width a parameter instead of hardcoding it to 2:

orig_prettify = bs4.BeautifulSoup.prettify r = re.compile(r'^(\s*)', re.MULTILINE) def prettify(self, encoding=None, formatter="minimal", indent_width=4):     return r.sub(r'\1' * indent_width, orig_prettify(self, encoding, formatter)) bs4.BeautifulSoup.prettify = prettify 

So:

x = '''<section><article><h1></h1><p></p></article></section>''' soup = bs4.BeautifulSoup(x) print(soup.prettify(indent_width=3)) 

… gives:

<html>    <body>       <section>          <article>             <h1>             </h1>             <p>             </p>          </article>       </section>    </body> </html> 

Obviously if you want to patch Tag.prettify as well as BeautifulSoup.prettify, you have to do the same thing there. (You might want to create a generic wrapper that you can apply to both, instead of repeating yourself.) And if there are any other prettify methods, same deal.

like image 54
abarnert Avatar answered Oct 13 '22 21:10

abarnert