Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get file size using python-requests, while only getting the header

I have looked at the requests documentation, but I can't seem to find anything. How do I only request the header, so I can assess filesize?

like image 460
scandinavian_ Avatar asked Jan 11 '13 02:01

scandinavian_


People also ask

How do I check the size of a file in Python?

Use the os. path. getsize('file_path') function to check the file size. Pass the file name or file path to this function as an argument.

Which HTTP method can request the size of a file before it is downloaded?

The file size is available in the HTTP Content-Length response header.

What are headers in Python requests?

HTTP headers let the client and the server pass additional information with an HTTP request or response. All the headers are case-insensitive, headers fields are separated by colon, key-value pairs in clear-text string format.


2 Answers

Send a HEAD request:

>>> import requests >>> response = requests.head('http://example.com') >>> response.headers     {'connection': 'close',  'content-encoding': 'gzip',  'content-length': '606',  'content-type': 'text/html; charset=UTF-8',  'date': 'Fri, 11 Jan 2013 02:32:34 GMT',  'last-modified': 'Fri, 04 Jan 2013 01:17:22 GMT',  'server': 'Apache/2.2.3 (CentOS)',  'vary': 'Accept-Encoding'} 

A HEAD request is like a GET request that only downloads the headers. Note that it's up to the server to actually honor your HEAD request. Some servers will only respond to GET requests, so you'll have to send a GET request and just close the connection instead of downloading the body. Other times, the server just never specifies the total size of the file.

like image 193
Blender Avatar answered Sep 22 '22 14:09

Blender


use requests.get(url, stream=True).headers['Content-length']

stream=True means when function returns, only the response header is downloaded, response body is not.

Both requests.get and request.head can get you headers but there's an advantage of using get

  1. get is more flexible, if you want to download the response body after inspecting the length, you can start by simply access the content property or using an iterator which will download the content in chunks
  2. "HEAD request SHOULD be identical to the information sent in response to a GET request." but its not always the case.

here is an example of getting the length of a MIT open course video

MitOpenCourseUrl = "http://www.archive.org/download/MIT6.006F11/MIT6_006F11_lec01_300k.mp4" resHead = requests.head(MitOpenCourseUrl) resGet = requests.get(MitOpenCourseUrl,stream=True) resHead.headers['Content-length'] # output 169 resGet.headers['Content-length'] # output 121291539 
like image 32
watashiSHUN Avatar answered Sep 23 '22 14:09

watashiSHUN