How to get pdf filename with Python requests?

I'm using the Python requests lib to get a PDF file from the web. This works fine, but I now also want the original filename. If I go to a PDF file in Firefox and click download it already has a filename defined to save the pdf. How do I get this filename?

For example:

import requests r = requests.get('http://www.researchgate.net/profile/M_Gotic/publication/260197848_Mater_Sci_Eng_B47_%281997%29_33/links/0c9605301e48beda0f000000.pdf') print r.headers['content-type']  # prints 'application/pdf'

I checked the r.headers for anything interesting, but there's no filename in there. I was actually hoping for something like r.filename..

Does anybody know how I can get the filename of a downloaded PDF file with requests library?

Can Python read a PDF file?

You can work with a preexisting PDF in Python by using the PyPDF2 package. PyPDF2 is a pure-Python package that you can use for many different types of PDF operations. By the end of this article, you'll know how to do the following: Extract document information from a PDF in Python.

It is specified in an http header content-disposition. So to extract the name you would do:

import re d = r.headers['content-disposition'] fname = re.findall("filename=(.+)", d)[0]

Name extracted from the string via regular expression (re module).

Building on some of the other answers, here's how I do it. If there isn't a Content-Disposition header, I parse it from the download URL:

import re import requests from requests.exceptions import RequestException   url = 'http://www.example.com/downloads/sample.pdf'  try:     with requests.get(url) as r:          fname = ''         if "Content-Disposition" in r.headers.keys():             fname = re.findall("filename=(.+)", r.headers["Content-Disposition"])[0]         else:             fname = url.split("/")[-1]          print(fname) except RequestException as e:     print(e)

There are arguably better ways of parsing the URL string, but for simplicity I didn't want to involve any more libraries.

How to get pdf filename with Python requests?

Tags:

python

filenames

pdf

python-requests

kramer65

People also ask

2 Answers

Eugene V

Nilpo

Recent Activity

Donate For Us

How to get pdf filename with Python requests?

Tags:

python

filenames

pdf

python-requests

kramer65

People also ask

2 Answers

Eugene V

Nilpo

Related questions

Recent Activity

Donate For Us