I was looking for a way to download pdf files in python, and I saw answers on other questions recommending the urllib module. I tried to download a pdf file using it, but when I try to open the downloaded file, a message shows up saying that the file cannot be opened.
error message
This is the code I used-
import urllib
urllib.urlretrieve("http://papers.gceguide.com/A%20Levels/Mathematics%20(9709)/9709_s11_qp_42.pdf", "9709_s11_qp_42.pdf")
What am I doing wrong? Also, the file automatically saves to the directory my python file is in. How do I change the location to which it gets saved?
Edit- I tried again with the link to a sample pdf, http://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf
The code is working with this link, so why won't it work for the other one?
Try this. It works.
import requests
url='https://pdfs.semanticscholar.org/c029/baf196f33050ceea9ecbf90f054fd5654277.pdf'
r = requests.get(url, stream=True)
with open('C:/Users/MICRO HARD/myfile.pdf', 'wb') as f:
f.write(r.content)
You can also use wget to download pdfs via a link:
import wget
wget.download(link)
Here's a guide about how to search & download all pdf files from a webpage in one go: https://medium.com/the-innovation/notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With