Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prettify using beautifulsoup without adding line breaks

Say I have an HTML file like this

<html>
<body>
<p>Some post</p>
<p>Another post</p>
</body>
</html>

In python I can use soup.prettify() to adjust line indentation. However, prettify adds additional line breaks. The output looks like this

<html>
 <body>
  <p>
   Some post
  </p>
  <p>
   Another post
  </p>
 </body>
</html>

I would like to add indentation only, without adding additonal line breaks (equivalent to the effect "Reindent" has in Sublime Text). That is, I would like to output to look like this

<html>
<body>
    <p>Some post</p>
    <p>Another post</p>
</body>
</html>

Can this be done in python?

like image 288
220284 Avatar asked Dec 13 '25 22:12

220284


1 Answers

You can disable additional line breaks for certain tags using the preserve_whitespace_tags keyword argument:

soup = bs4.BeautifulSoup(my_html, preserve_whitespace_tags=["p"])

Documentation: bs4.builder.TreeBuilder.__init__

A list of tags to treat the way tags are treated in HTML. Tags in this list are immune from pretty-printing; their contents will always be output as-is.

However, there doesn't seem to be a "don't add any whitespace" option. The documentation even states:

Since it adds whitespace (in the form of newlines), prettify() changes the meaning of an HTML document and should not be used to reformat one.

like image 116
Anton Yang-Wälder Avatar answered Dec 16 '25 14:12

Anton Yang-Wälder



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!