I occasionally use <code>res.content</code> or <code>res.text</code> to parse a response from Requests. In the use cases I have had, it didn't seem to matter which option I used. What is the main difference in parsing HTML with <code>.content</code> or <code>.text</code>? For example: <pre class="prettyprint"><code>import requests from lxml import html res = requests.get(...) node = html.fromstring(res.content) </code></pre> In the above situation, should I be using <code>res.content</code> or <code>res.text</code>? What is a good rule of thumb for when to use each?

From the documentation: <blockquote> When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by Requests is used when you access <code>r.text</code>. You can find out what encoding Requests is using, and change it, using the <code>r.encoding</code> property: </blockquote> <pre class="prettyprint"><code>>>> r.encoding 'utf-8' >>> r.encoding = 'ISO-8859-1' </code></pre> <blockquote> If you change the encoding, Requests will use the new value of <code>r.encoding</code> whenever you call <code>r.text</code>. You might want to do this in any situation where you can apply special logic to work out what the encoding of the content will be. For example, HTTP and XML have the ability to specify their encoding in their body. In situations like this, you should use <code>r.content</code> to find the encoding, and then set <code>r.encoding</code>. This will let you use <code>r.text</code> with the correct encoding. </blockquote> So <code>r.content</code> is used when the server returns binary data, or bogus encoding headers, to try to find the correct encoding inside a meta tag.

Should I use .text or .content when parsing a Requests response?

Tags:

python

python-requests

lxml

I occasionally use res.content or res.text to parse a response from Requests. In the use cases I have had, it didn't seem to matter which option I used.

What is the main difference in parsing HTML with .content or .text? For example:

import requests 
from lxml import html
res = requests.get(...)
node = html.fromstring(res.content)

In the above situation, should I be using res.content or res.text? What is a good rule of thumb for when to use each?

236

asked Oct 20 '16 19:10

David542

1 Answers

From the documentation:

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by Requests is used when you access r.text. You can find out what encoding Requests is using, and change it, using the r.encoding property:

>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

If you change the encoding, Requests will use the new value of r.encoding whenever you call r.text. You might want to do this in any situation where you can apply special logic to work out what the encoding of the content will be. For example, HTTP and XML have the ability to specify their encoding in their body. In situations like this, you should use r.content to find the encoding, and then set r.encoding. This will let you use r.text with the correct encoding.

So r.content is used when the server returns binary data, or bogus encoding headers, to try to find the correct encoding inside a meta tag.

136

answered Nov 02 '22 18:11

Francisco

Related questions
                            
                                No module named 'core' when using pyping for Python 3
                            
                                Checking the number of command line arguments in python
                            
                                Pandas: How to Compare Columns of Lists Row-wise in a DataFrame with Pandas (not for loop)?
                            
                                Why is there a performance difference between the order of a nested loop?
                            
                                What is a partial hit in code coverage?
                            
                                OpenCV Pipeline Editor
                            
                                Cannot Log in to Django Admin Interface with Heroku Deployed App
                            
                                How to apply decorator to all blueprint urls in flask
                            
                                Why does defining the argument types for __eq__ throw a MyPy type error?
                            
                                Parse currency into numbers in Python
                            
                                Using Python to communicate with web socket using JSON
                            
                                Tensorflow negative sampling
                            
                                Selenium install Marionette webdriver
                            
                                How can use CFFI to call an existing C function given the source code?
                            
                                What is the current value of a Python itertools counter
                            
                                Convert a list of dictionaries to numpy matrix? [duplicate]
                            
                                Fill up a dictionary in parallel with multiprocessing
                            
                                ImportError on python 3, worked fine on python 2.7
                            
                                What is the point of calling super in custom error classes in python?
                            
                                How to delete unpopulated placeholder items using python-pptx

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With