Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rsync --sparse does transfer whole data

I have some VM Images that need to synced everyday. The VM files are sparse'd.

To save network traffic i only want to transfer the real datas of the images. I try it with --sparse option at rsync but on network traffic i see that the whole size get transfered over network and not only the real data usage.

If i use rsync -zv --sparse then only the real size get transmitted over network and everything is ok. But i dont want to compression the file because of the cpu usage.

Shouldnt the --sparse option transfer only real datas and the "null datas" get created locally to save network traffic?

Is there a workaround without compression?

Thanks!

like image 728
user2933212 Avatar asked Nov 06 '13 19:11

user2933212


1 Answers

Take a look a this discussion, specifically, this answer.

It seems that the solution is to do a rsync --sparse followed by a rsync --inplace.

On the first, --sparse, call, also use --ignore-existing to prevent already transferred sparse files to be overwritten, and -z to save network resources.

The second call, --inplace, should update only modified chunks. Here, compression is optional.

Also see this post.

Update

I believe the suggestions above won't solve your problem. I also believe that rsync is not the right tool for the task. You should search for other tools which will give you a good balance between network and disk I/O efficiency.

Rsync was designed for efficient usage of a single resource, the network. It assumes reading and writing to the network is much more expensive than reading and writing the source and destination files.

We assume that the two machines are connected by a low-bandwidth high-latency bi-directional communications link. The rsync algorithm, abstract.

The algorithm, summarized in four steps.

  1. The receiving side β sends checksums of blocks of size S of the destination file B.
  2. The sending side α identify blocks that match in the source file A, at any offset.
  3. α sends β a list of instructions made of either verbatim, non-matching, data, or matching block references.
  4. β reconstructs the whole file from those instructions.

Notice that rsync normally reconstructs the file B as a temporary file T, then replaces B with T. In this case it must write the whole file.

The --inplace does not relieve rsync from writing blocks matched by α, as one could imagine. They can match at different offsets. Scanning B a second time to take new data checksums is prohibitive in terms of performance. A block that matches in the same offset it was read on step one could be skipped, but rsync does not do that. In the case of a sparse file, a null block of B would match for every null block of A, and would have to be rewritten.

The --inplace just causes rsync to write directly to B, instead of T. It will rewrite the whole file.

like image 174
Rafa Avatar answered Sep 25 '22 02:09

Rafa