Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python/Java script to download all .pdf files from a website

I was wondering if it was possible to write a script that could programmatically go throughout a webpage and download all .pdf file links automatically. Before I start attempting on my own, I want to know whether or not this is possible.

Regards

like image 501
sudobangbang Avatar asked Feb 15 '14 13:02

sudobangbang


People also ask

How can I download a PDF from a URL using Javascript?

dispatchEvent(new MouseEvent('click')); } var fileURL = "link/to/pdf"; var fileName = "test. pdf"; download(fileURL,fileName); The code above is just to test download one file from a hardcoded URL. If it worked as intended, when the page is loaded, it should download the pdf from the provided url.


1 Answers

Yes it's possible in Python. You can obtain the html source code, parse it using BeautifulSoup and then find all the tags. Next, you can check the links which end with the .pdf extension. Once you have a list of all the pdf links, you can download them using

wget.download(link)

or requests

A detailed explanation and full source code can be found here:

https://medium.com/@dementorwriter/notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48

like image 179
x89 Avatar answered Oct 05 '22 14:10

x89