I need to download all the files under this links where only the suburb name keep changing in each link
Just a reference https://www.data.vic.gov.au/data/dataset/2014-town-and-community-profile-for-thornbury-suburb
All the files under this search link: https://www.data.vic.gov.au/data/dataset?q=2014+town+and+community+profile
Any possibilities?
Thanks :)
You can download file like this
import urllib2
response = urllib2.urlopen('http://www.example.com/file_to_download')
html = response.read()
To get all the links in a page
from bs4 import BeautifulSoup
import requests
r = requests.get("http://site-to.crawl")
data = r.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
You should first read the html, parse it using Beautiful Soup and then find links according to the file type you want to download. For instance, if you want to download all pdf files, you can check if the links end with the .pdf extension or not.
There's a good explanation and code available here:
https://medium.com/@dementorwriter/notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With