My application makes numerous HTTP requests. Without writing a regular expression, how do I parse <code>Content-Type</code> header values? For example: <pre class="prettyprint"><code>text/html; charset=UTF-8 </code></pre> For context, here is my code for getting stuff in the internet: <pre class="prettyprint"><code>from requests import head foo = head("http://www.example.com") </code></pre> The output I am expecting is similar to what the methods do in mimetools. For example: <pre class="prettyprint"><code>x = magic("text/html; charset=UTF-8") </code></pre> Will output: <pre class="prettyprint"><code>x.getparam('charset') # UTF-8 x.getmaintype() # text x.getsubtype() # html </code></pre>

<code>requests</code> doesn't give you an interface to parse the content type, unfortunately, and the standard library on this stuff is a bit of a mess. So I see two options: Option 1: Go use the python-mimeparse third-party library. Option 2: To separate the mime type from options like <code>charset</code>, you can use the same technique that <code>requests</code> uses to parse type/encoding internally: use <code>cgi.parse_header</code>. <pre class="prettyprint"><code>response = requests.head('http://example.com') mimetype, options = cgi.parse_header(response.headers['Content-Type']) </code></pre> The rest should be simple enough to handle with a <code>split</code>: <pre class="prettyprint"><code>maintype, subtype = mimetype.split('/') </code></pre>

How to parse the value of Content-Type from an HTTP Header Response?

Tags:

python

mime-types

content-type

python-requests

My application makes numerous HTTP requests. Without writing a regular expression, how do I parse Content-Type header values? For example:

text/html; charset=UTF-8

For context, here is my code for getting stuff in the internet:

from requests import head

foo = head("http://www.example.com")

The output I am expecting is similar to what the methods do in mimetools. For example:

x = magic("text/html; charset=UTF-8")

Will output:

x.getparam('charset')  # UTF-8
x.getmaintype()  # text
x.getsubtype()  # html

214

asked Sep 01 '15 11:09

A. K. Tolentino

1 Answers

requests doesn't give you an interface to parse the content type, unfortunately, and the standard library on this stuff is a bit of a mess. So I see two options:

Option 1: Go use the python-mimeparse third-party library.

Option 2: To separate the mime type from options like charset, you can use the same technique that requests uses to parse type/encoding internally: use cgi.parse_header.

response = requests.head('http://example.com')
mimetype, options = cgi.parse_header(response.headers['Content-Type'])

The rest should be simple enough to handle with a split:

maintype, subtype = mimetype.split('/')

172

answered Oct 22 '22 02:10

Owen S.

Related questions
                            
                                Split Python sequence (time series/array) into subsequences with overlap
                            
                                Pandas filtering - between_time on a non-index column
                            
                                Scrapy grab div with multiple classes?
                            
                                TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0] while using RF classifier?
                            
                                Plotting with Matplotlib in Visual Studio using Python Tools for Visual Studio
                            
                                Adding column to pandas DataFrame containing list of other columns' values
                            
                                Why print in Python doesn't pause when using sleep in a loop?
                            
                                Plotting a dataframe (pandas) in pycharm, not displaying
                            
                                python: What is the cost of re-importing modules?
                            
                                Python: Reading Ftp file list with UTF-8?
                            
                                Basic multiprocessing with while loop
                            
                                Python reverse / inverse a mapping (but with multiple values for each key)
                            
                                How to read image from numpy array into PIL Image?
                            
                                Float must be a string or a number?
                            
                                How to validate / verify an X509 Certificate chain of trust in Python?
                            
                                Portfolio rebalancing with bandwidth method in python
                            
                                Python, PyDot and DecisionTree
                            
                                Plot Piecewise Function in Python
                            
                                Matching Features with ORB python opencv
                            
                                How does the @timeout(timelimit) decorator work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With