Download all the files in the website

Question

I need to download all the files under this links where only the suburb name keep changing in each link

Just a reference https://www.data.vic.gov.au/data/dataset/2014-town-and-community-profile-for-thornbury-suburb

All the files under this search link: https://www.data.vic.gov.au/data/dataset?q=2014+town+and+community+profile

Any possibilities?

Thanks :)

naren · Accepted Answer

You can download file like this

import urllib2
response = urllib2.urlopen('http://www.example.com/file_to_download')
html = response.read()

To get all the links in a page

from bs4 import BeautifulSoup

import requests
r  = requests.get("http://site-to.crawl")
data = r.text
soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    print(link.get('href'))

x89 · Answer

You should first read the html, parse it using Beautiful Soup and then find links according to the file type you want to download. For instance, if you want to download all pdf files, you can check if the links end with the .pdf extension or not.

There's a good explanation and code available here:

https://medium.com/@dementorwriter/notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48

Download all the files in the website

Tags:

python

r

download

webclient

Bharath

2 Answers

naren

x89

Recent Activity

Donate For Us

Download all the files in the website

Tags:

python

r

download

webclient

Bharath

2 Answers

naren

x89

Related questions

Recent Activity

Donate For Us