Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiple wget -r a site simultaneously?

any command / wget with options?

For multithreaded download a site recursively and simultaneously?

like image 320
c2h2 Avatar asked Jan 20 '11 10:01

c2h2


People also ask

How do you wget multiple files?

Wget Download Multiple Files From a File To download multiple files at once, use the -i option with the location of the file that contains the list of URLs to be downloaded. Each URL needs to be added on a separate line as shown. For example, the following file 'download-linux.

How do I download multiple links using wget?

While you could invoke wget multiple times manually, there are several ways to download multiple files with wget in one shot. If you know a list of URLs to fetch, you can simply supply wget with an input file that contains a list of URLs. Use -i option is for that purpose.

Is wget multi thread?

wget solution doesn't download one file in multiple threads: The options used -r is recursive, -np ( --no-parent ) — don't ascend to the parent directory, -N ( --timestamping ) — don't re-retrieve files unless newer than local. But wget would definitely work if you're downloading a mirror of a site.

How do I download all files using wget?

In order to download multiples files using Wget, you need to create a . txt file and insert the URLs of the files you wish to download. After inserting the URLs inside the file, use the wget command with the -i option followed by the name of the . txt file containing the URLs.


2 Answers

I found a decent solution.

Read original at http://www.linuxquestions.org/questions/linux-networking-3/wget-multi-threaded-downloading-457375/

wget -r -np -N [url] &
wget -r -np -N [url] &
wget -r -np -N [url] &
wget -r -np -N [url] &

copied as many times as you deem fitting to have as much processes downloading. This isn't as elegant as a properly multithreaded app, but it will get the job done with only a slight amount of over head. the key here being the "-N" switch. This means transfer the file only if it is newer than what's on the disk. This will (mostly) prevent each process from downloading the same file a different process already downloaded, but skip the file and download what some other process hasn't downloaded. It uses the time stamp as a means of doing this, hence the slight overhead.

It works great for me and saves a lot of time. Don't have too many processes as this may saturate the web site's connection and tick off the owner. Keep it around a max of 4 or so. However, the number is only limited by CPU and network bandwidth on both ends.

like image 85
Julian Avatar answered Oct 18 '22 05:10

Julian


With the use of parallel wget utilizing the xargs switch, this solution seems so much better:

https://stackoverflow.com/a/11850469/1647809

like image 43
sandyp Avatar answered Oct 18 '22 06:10

sandyp