We've been using gsutil -m rsync -r
to keep dev and deploy boxes in sync with a GCS bucket for nearly 2 years without any problem. There are about 85k objects in the bucket.
Until recently, this worked perfectly: we'd run a deploy-box -> GCS rsync every 15 mins or so, to keep all new uploaded resource backed up, and then a GCS -> dev box rsync whenever we wanted to refresh the local dev data (running on OSX El Capitan).
Within the last couple of months, though, the GCS->dev rsync has started to bloat, downloading more and more images.
Initially I just thought "great, we're getting more resources uploaded", but it's been growing way faster than the data, until today when it seems to be downloading the whole 85k images.
I've double-checked I'm in the right place, the command is correct, the paths are correct, etc. For all that the gsutil
output is scrolling by with reams and reams of "Copying..." and "Downloading..." messages, making good parallel use of our 100mbps connection, when I go to another terminal and run find . -type f | wc -l
on the destination directory every 10 seconds, it shows that barely 2 or 3 new files are being added a minute. I look at modification times on files that gsutil says it's downloading right now and in the large majority they're old, plenty haven't changed in a year or more. Meaning: it's downloading all the data, using tons of time and bandwidth, all for the sake of a few hundred files.
Has something changed in recent OSX gsutil
versions? Is there possibly a bug? How would I even start to go about tracking this down? Or reporting it? The newsgroups gsutil-discuss
and gs-discussion
have been archived, and the talk in gce-discussion
is all about using gsutil
from GCE instances.
Thanks!
I had a similar issue where the same files were synced over and over. I don't have that many files so you might need to check for performance but I decided to use the -c
option to force using the checksum instead of mtime which was modified locally in my build process.
I think (and hope) the documentation is slightly wrong stating that
compare checksums for files if the size of source and destination as well as mtime match
as it seems to use checksum even if mtime does not match
gsutil 4.20 (released 2016-07-20) modified the change detection algorithm for rsync. Instead of comparing only the size of the local file with its cloud counterpart, it now compares both the size and file modification time of local files. The file modification time is stored in the custom user metadata for the file when it is uploaded with rsync. If that doesn't exist the object creation time is used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With