<p>Let's say I have a page with a <code>div</code>. I can easily get that div with <code>soup.find()</code>.</p> <p>Now that I have the result, I'd like to print the WHOLE <code>innerhtml</code> of that <code>div</code>: I mean, I'd need a string with ALL the html tags and text all toegether, exactly like the string I'd get in javascript with <code>obj.innerHTML</code>. Is this possible?</p>

<h3>TL;DR</h3> <p>With BeautifulSoup 4 use <code>element.encode_contents()</code> if you want a UTF-8 encoded bytestring or use <code>element.decode_contents()</code> if you want a Python Unicode string. For example the DOM's innerHTML method might look something like this:</p> <pre class="prettyprint lang-py prettyprint-override"><code>def innerHTML(element): """Returns the inner HTML of an element as a UTF-8 encoded bytestring""" return element.encode_contents() </code></pre> <hr> <p>These functions aren't currently in the online documentation so I'll quote the current function definitions and the doc string from the code.</p> <h3> <code>encode_contents</code> - since 4.0.4</h3> <pre class="prettyprint lang-py prettyprint-override"><code>def encode_contents( self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING, formatter="minimal"): """Renders the contents of this tag as a bytestring. :param indent_level: Each line of the rendering will be indented this many spaces. :param encoding: The bytestring will be in this encoding. :param formatter: The output formatter responsible for converting entities to Unicode characters. """ </code></pre> <p>See also the documentation on formatters; you'll most likely either use <code>formatter="minimal"</code> (the default) or <code>formatter="html"</code> (for html entities) unless you want to manually process the text in some way.</p> <p><code>encode_contents</code> returns an encoded bytestring. If you want a Python Unicode string then use <code>decode_contents</code> instead.</p> <hr> <h3> <code>decode_contents</code> - since 4.0.1</h3> <p><code>decode_contents</code> does the same thing as <code>encode_contents</code> but returns a Python Unicode string instead of an encoded bytestring.</p> <pre class="prettyprint lang-py prettyprint-override"><code>def decode_contents(self, indent_level=None, eventual_encoding=DEFAULT_OUTPUT_ENCODING, formatter="minimal"): """Renders the contents of this tag as a Unicode string. :param indent_level: Each line of the rendering will be indented this many spaces. :param eventual_encoding: The tag is destined to be encoded into this encoding. This method is _not_ responsible for performing that encoding. This information is passed in so that it can be substituted in if the document contains a <META> tag that mentions the document's encoding. :param formatter: The output formatter responsible for converting entities to Unicode characters. """ </code></pre> <hr> <h3>BeautifulSoup 3</h3> <p>BeautifulSoup 3 doesn't have the above functions, instead it has <code>renderContents</code> </p> <pre class="prettyprint lang-py prettyprint-override"><code>def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING, prettyPrint=False, indentLevel=0): """Renders the contents of this tag as a string in the given encoding. If encoding is None, returns a Unicode string..""" </code></pre> <p>This function was added back to BeautifulSoup 4 (in 4.0.4) for compatibility with BS3.</p>

<p>One of the options could be use something like that:</p> <pre class="prettyprint"><code> innerhtml = "".join([str(x) for x in div_element.contents]) </code></pre>

BeautifulSoup innerhtml?

Tags:

python

html

beautifulsoup

innerhtml

Let's say I have a page with a div. I can easily get that div with soup.find().

Now that I have the result, I'd like to print the WHOLE innerhtml of that div: I mean, I'd need a string with ALL the html tags and text all toegether, exactly like the string I'd get in javascript with obj.innerHTML. Is this possible?

287

asked Nov 13 '11 16:11

Matteo Monti

2 Answers

TL;DR

With BeautifulSoup 4 use element.encode_contents() if you want a UTF-8 encoded bytestring or use element.decode_contents() if you want a Python Unicode string. For example the DOM's innerHTML method might look something like this:

def innerHTML(element):     """Returns the inner HTML of an element as a UTF-8 encoded bytestring"""     return element.encode_contents()

These functions aren't currently in the online documentation so I'll quote the current function definitions and the doc string from the code.

`encode_contents` - since 4.0.4

def encode_contents(     self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING,     formatter="minimal"):     """Renders the contents of this tag as a bytestring.      :param indent_level: Each line of the rendering will be        indented this many spaces.      :param encoding: The bytestring will be in this encoding.      :param formatter: The output formatter responsible for converting        entities to Unicode characters.     """

See also the documentation on formatters; you'll most likely either use formatter="minimal" (the default) or formatter="html" (for html entities) unless you want to manually process the text in some way.

encode_contents returns an encoded bytestring. If you want a Python Unicode string then use decode_contents instead.

`decode_contents` - since 4.0.1

decode_contents does the same thing as encode_contents but returns a Python Unicode string instead of an encoded bytestring.

def decode_contents(self, indent_level=None,                    eventual_encoding=DEFAULT_OUTPUT_ENCODING,                    formatter="minimal"):     """Renders the contents of this tag as a Unicode string.      :param indent_level: Each line of the rendering will be        indented this many spaces.      :param eventual_encoding: The tag is destined to be        encoded into this encoding. This method is _not_        responsible for performing that encoding. This information        is passed in so that it can be substituted in if the        document contains a <META> tag that mentions the document's        encoding.      :param formatter: The output formatter responsible for converting        entities to Unicode characters.     """

BeautifulSoup 3

BeautifulSoup 3 doesn't have the above functions, instead it has renderContents

def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING,                    prettyPrint=False, indentLevel=0):     """Renders the contents of this tag as a string in the given     encoding. If encoding is None, returns a Unicode string.."""

This function was added back to BeautifulSoup 4 (in 4.0.4) for compatibility with BS3.

145

answered Sep 28 '22 08:09

ChrisD

One of the options could be use something like that:

 innerhtml = "".join([str(x) for x in div_element.contents])

answered Sep 28 '22 07:09

peewhy

Related questions
                            
                                Python package name conventions
                            
                                How to stream an HttpResponse with Django
                            
                                Python glob but against a list of strings rather than the filesystem
                            
                                How to split Vector into columns - using PySpark
                            
                                negative zero in python
                            
                                Using the __call__ method of a metaclass instead of __new__?
                            
                                Pylint showing invalid variable name in output
                            
                                Ruby equivalent of Python's "dir"?
                            
                                How to write bytes to a file in Python 3 without knowing the encoding?
                            
                                Subclassing int in Python
                            
                                High Memory Usage Using Python Multiprocessing
                            
                                How to do Decimal to float conversion in Python?
                            
                                How to automatically destroy django test database
                            
                                How can I use io.StringIO() with the csv module?
                            
                                How to access sparse matrix elements?
                            
                                Python mock call_args_list unpacking tuples for assertion on arguments
                            
                                Scope of variable within "with" statement?
                            
                                Pandas isna() and isnull(), what is the difference?
                            
                                How to group DataFrame by a period of time?
                            
                                Django persistent database connection

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

BeautifulSoup innerhtml?

Tags:

python

html

beautifulsoup

innerhtml

Matteo Monti

People also ask

2 Answers

TL;DR

`encode_contents` - since 4.0.4

`decode_contents` - since 4.0.1

BeautifulSoup 3

ChrisD

peewhy

Recent Activity

Donate For Us

BeautifulSoup innerhtml?

Tags:

python

html

beautifulsoup

innerhtml

Matteo Monti

People also ask

2 Answers

TL;DR

encode_contents - since 4.0.4

decode_contents - since 4.0.1

BeautifulSoup 3

ChrisD

peewhy

Related questions

Recent Activity

Donate For Us

`encode_contents` - since 4.0.4

`decode_contents` - since 4.0.1