Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download all the links(related documents) on a webpage using Python

Tags:

python

I have to download a lot of documents from a webpage. They are wmv files, PDF, BMP etc. Of course, all of them have links to them. So each time, I have to RMC a file, select 'Save Link As' Then save then as type All Files. Is it possible to do this in Python? I search the SO DB and folks have answered question of how to get the links from the webpage. I want to download the actual files. Thanks in advance. (This is not a HW question :)).

like image 832
Sumod Avatar asked May 12 '11 07:05

Sumod


People also ask

How do I download multiple files from a website using Python?

Download multiple files in parallel with Python To start, create a function ( download_parallel ) to handle the parallel download. The function ( download_parallel ) will take one argument, an iterable containing URLs and associated filenames (the inputs variable we created earlier).


1 Answers

  • Follow the Python codes in this link: wget-vs-urlretrieve-of-python.
  • You can also do this very easily with Wget. Try --limit, --recursive and --accept command-lines in Wget. For example: wget --accept wmv,doc --limit 2 --recursive http://www.example.com/files/
like image 170
gsbabil Avatar answered Sep 21 '22 15:09

gsbabil