Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse the value of Content-Type from an HTTP Header Response?

My application makes numerous HTTP requests. Without writing a regular expression, how do I parse Content-Type header values? For example:

text/html; charset=UTF-8

For context, here is my code for getting stuff in the internet:

from requests import head

foo = head("http://www.example.com")

The output I am expecting is similar to what the methods do in mimetools. For example:

x = magic("text/html; charset=UTF-8")

Will output:

x.getparam('charset')  # UTF-8
x.getmaintype()  # text
x.getsubtype()  # html
like image 214
A. K. Tolentino Avatar asked Sep 01 '15 11:09

A. K. Tolentino


People also ask

What is the value of Content-Type header in a HTTP response?

In responses, a Content-Type header provides the client with the actual content type of the returned content. This header's value may be ignored, for example when browsers perform MIME sniffing; set the X-Content-Type-Options header value to nosniff to prevent this behavior.

Which request header specifies the Content-Type?

The Content-Type http request header specifies the content type of the http request payload. The Content-Type header is NOT tied to the content type of the response sent by the server. Here's an example using pure JavaScript to make an asynchronous HTTP request from the browser.

What are the contents of an HTTP request header response header?

Request headers contain more information about the resource to be fetched, or about the client requesting the resource. Response headers hold additional information about the response, like its location or about the server providing it.

Is Content-Type a response header?

The Content-Type header is used to indicate the media type of the resource. The media type is a string sent along with the file indicating the format of the file. For example, for image file its media type will be like image/png or image/jpg, etc. In response, it tells about the type of returned content, to the client.


1 Answers

requests doesn't give you an interface to parse the content type, unfortunately, and the standard library on this stuff is a bit of a mess. So I see two options:

Option 1: Go use the python-mimeparse third-party library.

Option 2: To separate the mime type from options like charset, you can use the same technique that requests uses to parse type/encoding internally: use cgi.parse_header.

response = requests.head('http://example.com')
mimetype, options = cgi.parse_header(response.headers['Content-Type'])

The rest should be simple enough to handle with a split:

maintype, subtype = mimetype.split('/')
like image 172
Owen S. Avatar answered Oct 22 '22 02:10

Owen S.