I was wondering if it was possible to write a script that could programmatically go throughout a webpage and download all .pdf file links automatically. Before I start attempting on my own, I want to know whether or not this is possible.
Regards
dispatchEvent(new MouseEvent('click')); } var fileURL = "link/to/pdf"; var fileName = "test. pdf"; download(fileURL,fileName); The code above is just to test download one file from a hardcoded URL. If it worked as intended, when the page is loaded, it should download the pdf from the provided url.
Yes it's possible in Python. You can obtain the html source code, parse it using BeautifulSoup and then find all the tags. Next, you can check the links which end with the .pdf extension. Once you have a list of all the pdf links, you can download them using
wget.download(link)
or requests
A detailed explanation and full source code can be found here:
https://medium.com/@dementorwriter/notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With