Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to resume wget mirroring website?

I use wget to download an entire website.
I used the follwing command (in windows 7):

wget ^
 --recursive ^
 -A "*thread*, *label*" ^
 --no-clobber ^
 --page-requisites ^
 --html-extension ^
 --domains example.com ^
 --random-wait ^
 --no-parent ^
 --background ^
 --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" ^
     http://example.com/

After 2 days my little brother restarted the PC
so I tried to resume the stopped process
I added the following to the command

--continue ^

so the code looks like

wget ^
     --recursive ^
     -A "*thread*, *label*" ^
     --no-clobber ^
     --page-requisites ^
     --html-extension ^
     --domains example.com ^
     --random-wait ^
     --no-parent ^
     --background ^
     --continue ^
     --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" ^
         http://example.com/

unfortunately it started a new job it downloads the same files again and write a new log file named

wget-log.1

Is there anyway to resume mirroring site with wget or do have I to start the whole thing over again?

like image 806
Abdalla Mohamed Aly Ibrahim Avatar asked May 04 '15 01:05

Abdalla Mohamed Aly Ibrahim


People also ask

How do I continue wget?

To resume a wget download it's very straight forward. Open the terminal to the directory where you were downloading your file to and run wget with the -c flag to resume the download.

How do I mirror a website using wget?

Now, type the following arguments to get the following command: wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://site-to-download.com. Replace the https://site-to-download.com portion with the actual site URL you want to make a mirror of. You are done!

What is download recursively?

This means that Wget first downloads the requested document, then the documents linked from that document, then the documents linked by them, and so on. In other words, Wget first downloads the documents at depth 1, then those at depth 2, and so on until the specified maximum depth.


1 Answers

Try -nc option. It checks everything once again, but doesn't download it.

I'm using this code to download one website: wget -r -t1 domain.com -o log

I've stopped the process, I wanted to resume it, so I changed the code: wget -nc -r -t1 domain.com -o log

In the logs there is something like this: File .... already there; not retrieving. etc.

I checked logs before this and it seems that after maybe 5 minutes of this kind of checking it begins to download new files.

I'm using this manual for wget: http://www.linux.net.pl/~wkotwica/doc/wget/wget_8.html

like image 120
jack daniels Avatar answered Oct 12 '22 09:10

jack daniels