I use wget to download an entire website.
I used the follwing command (in windows 7):
wget ^
--recursive ^
-A "*thread*, *label*" ^
--no-clobber ^
--page-requisites ^
--html-extension ^
--domains example.com ^
--random-wait ^
--no-parent ^
--background ^
--header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" ^
http://example.com/
After 2 days my little brother restarted the PC
so I tried to resume the stopped process
I added the following to the command
--continue ^
so the code looks like
wget ^
--recursive ^
-A "*thread*, *label*" ^
--no-clobber ^
--page-requisites ^
--html-extension ^
--domains example.com ^
--random-wait ^
--no-parent ^
--background ^
--continue ^
--header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" ^
http://example.com/
unfortunately it started a new job it downloads the same files again and write a new log file named
wget-log.1
Is there anyway to resume mirroring site with wget or do have I to start the whole thing over again?
To resume a wget download it's very straight forward. Open the terminal to the directory where you were downloading your file to and run wget with the -c flag to resume the download.
Now, type the following arguments to get the following command: wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://site-to-download.com. Replace the https://site-to-download.com portion with the actual site URL you want to make a mirror of. You are done!
This means that Wget first downloads the requested document, then the documents linked from that document, then the documents linked by them, and so on. In other words, Wget first downloads the documents at depth 1, then those at depth 2, and so on until the specified maximum depth.
Try -nc option. It checks everything once again, but doesn't download it.
I'm using this code to download one website:
wget -r -t1 domain.com -o log
I've stopped the process, I wanted to resume it, so I changed the code:
wget -nc -r -t1 domain.com -o log
In the logs there is something like this:
File .... already there; not retrieving. etc.
I checked logs before this and it seems that after maybe 5 minutes of this kind of checking it begins to download new files.
I'm using this manual for wget: http://www.linux.net.pl/~wkotwica/doc/wget/wget_8.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With