Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resuming rsync partial (-P/--partial) on a interrupted transfer

Tags:

I am trying to backup my file server to a remove file server using rsync. Rsync is not successfully resuming when a transfer is interrupted. I used the partial option but rsync doesn't find the file it already started because it renames it to a temporary file and when resumed it creates a new file and starts from beginning.

Here is my command:

rsync -avztP -e "ssh -p 2222" /volume1/ myaccont@backup-server-1:/home/myaccount/backup/ --exclude "@spool" --exclude "@tmp"

When this command is ran, a backup file named OldDisk.dmg from my local machine get created on the remote machine as something like .OldDisk.dmg.SjDndj23.

Now when the internet connection gets interrupted and I have to resume the transfer, I have to find where rsync left off by finding the temp file like .OldDisk.dmg.SjDndj23 and rename it to OldDisk.dmg so that it sees there already exists a file that it can resume.

How do I fix this so I don't have to manually intervene each time?

like image 992
Glitches Avatar asked May 15 '13 18:05

Glitches


People also ask

Can rsync resume after being interrupted?

We can easily resume partially transferred files over SSH using Rsync. It helps you to resume the interrupted copy or download process where you left it off.

Will rsync resume?

So I wonder in my case if rsync can resume what was left last time? Yes, rsync won't copy again files that it's already copied.

What is rsync -- partial?

Using the --partial option tells rsync to keep the partial file which should make a subsequent transfer of the rest of the file much faster. --progress This option tells rsync to print information showing the progress of the transfer. This gives a bored user something to watch. This option is normally combined with -v.

Does rsync overwrite destination files?

It doesn't care which file is newer, if it is different, it gets overwritten. You can pass the '--update' flag to rsync which will cause it to skip files on the destination if they are newer than the file on the source, but only so long as they are the same type of file.


2 Answers

Sorry but the other answers here are too complicated :-7. A simpler answer working for me: (using rsync over -e ssh)

# optionally move rsync temp file, then resume using rsync 
dst$ mv .<filename>.6FuChr <filename>
src$ rsync -avhzP --bwlimit=1000 -e ssh <fromfiles> <user@somewhere>:<destdir>/

Works also when resuming from an scp which was interrupted.

Rsync creates a temporary file ... The temporary file grows quickly to size of partially transferred file. Transfer resumes.

Scp writes to the actual end destination file . If transfer is interrupted this is a truncated file.

Explaination of args:

-avhz .. h=humanoid, v=verbose, a=archive, z=compression .. archive instructs it to maintain time_t values so even if clocks are out rsync knows the true date of each file

-P is short for --partial --progress. --partial tells rsync to keep partially transferred files (and upon resume rsync will use partially transferred files always after checksumming safely)

From man pages: http://ss64.com/bash/rsync_options.html

--partial
By default, rsync will delete any partially transferred file if the transfer
is interrupted. In some circumstances it is more desirable to keep partially
transferred files. Using the --partial option tells rsync to keep the partial
file which should make a subsequent transfer of the rest of the file much faster.

--progress
This option tells rsync to print information showing the progress of the transfer.
This gives a bored user something to watch.
This option is normally combined with -v. Using this option without the -v option
will produce weird results on your display.

-P
The -P option is equivalent to --partial --progress.
I found myself typing that combination quite often so I created an option to make
it easier.

NOTE: for a connection which is interrupted multiple times: If you need to resume after rsync (after the connection is interrupted) then it is best to rename the temporary file on destination. scp creates a file on destination with same name as final file. If scp is interrupted this file is a truncated version of the file. An rsync (-avzhP) will resume from that file but start writing to a temporary file name like ..Yhg7al.

Procedure when starting with scp:

scp; *interrupt*; rsync; [REPEAT_as_needed: *interrupt*; mv .destfile.tmpzhX destfile; rsync;]. 

Procedure when starting with rsync:

rsync; [REPEAT_as_needed: *interrupt*; mv .destfile.tmpzhX destfile; rsync;].
like image 30
gaoithe Avatar answered Oct 05 '22 03:10

gaoithe


TL;DR: Use --timeout=X (X in seconds) to change the default rsync server timeout, not --inplace.

The issue is the rsync server processes (of which there are two, see rsync --server ... in ps output on the receiver) continue running, to wait for the rsync client to send data.

If the rsync server processes do not receive data for a sufficient time, they will indeed timeout, self-terminate and cleanup by moving the temporary file to it's "proper" name (e.g., no temporary suffix). You'll then be able to resume.

If you don't want to wait for the long default timeout to cause the rsync server to self-terminate, then when your internet connection returns, log into the server and clean up the rsync server processes manually. However, you must politely terminate rsync -- otherwise, it will not move the partial file into place; but rather, delete it (and thus there is no file to resume). To politely ask rsync to terminate, do not SIGKILL (e.g., -9), but SIGTERM (e.g., pkill -TERM -x rsync - only an example, you should take care to match only the rsync processes concerned with your client).

Fortunately there is an easier way: use the --timeout=X (X in seconds) option; it is passed to the rsync server processes as well.

For example, if you specify rsync ... --timeout=15 ..., both the client and server rsync processes will cleanly exit if they do not send/receive data in 15 seconds. On the server, this means moving the temporary file into position, ready for resuming.

I'm not sure of the default timeout value of the various rsync processes will try to send/receive data before they die (it might vary with operating system). In my testing, the server rsync processes remain running longer than the local client. On a "dead" network connection, the client terminates with a broken pipe (e.g., no network socket) after about 30 seconds; you could experiment or review the source code. Meaning, you could try to "ride out" the bad internet connection for 15-20 seconds.

If you do not clean up the server rsync processes (or wait for them to die), but instead immediately launch another rsync client process, two additional server processes will launch (for the other end of your new client process). Specifically, the new rsync client will not re-use/reconnect to the existing rsync server processes. Thus, you'll have two temporary files (and four rsync server processes) -- though, only the newer, second temporary file has new data being written (received from your new rsync client process).

Interestingly, if you then clean up all rsync server processes (for example, stop your client which will stop the new rsync servers, then SIGTERM the older rsync servers, it appears to merge (assemble) all the partial files into the new proper named file. So, imagine a long running partial copy which dies (and you think you've "lost" all the copied data), and a short running re-launched rsync (oops!).. you can stop the second client, SIGTERM the first servers, it will merge the data, and you can resume.

Finally, a few short remarks:

  • Don't use --inplace to workaround this. You will undoubtedly have other problems as a result, man rsync for the details.
  • It's trivial, but -t in your rsync options is redundant, it is implied by -a.
  • An already compressed disk image sent over rsync without compression might result in shorter transfer time (by avoiding double compression). However, I'm unsure of the compression techniques in both cases. I'd test it.
  • As far as I understand --checksum / -c, it won't help you in this case. It affects how rsync decides if it should transfer a file. Though, after a first rsync completes, you could run a second rsync with -c to insist on checksums, to prevent the strange case that file size and modtime are the same on both sides, but bad data was written.
like image 163
Richard Michael Avatar answered Oct 05 '22 04:10

Richard Michael