Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gsutil rsync with gzip compression

I'm hosting publicly available static resources in a google storage bucket, and I want to use the gsutil rsync command to sync our local version to the bucket, saving bandwidth and time. Part of our build process is to pre-gzip these resources, but gsutil rsync has no way to set the Content-Encoding header. This means we must run gsutil rsync, then immediately run gsutil setmeta to set headers on all the of gzipped file types. This leaves the bucket in a BAD state until that header is set. Another option is to use gsutil cp, passing the -z option, but this requires us to re-upload the entire directory structure every time, and this includes a LOT of image files and other non-gzipped resources that wastes time and bandwidth.

Is there an atomic way to accomplish the rsync and set proper Content-Encoding headers?

like image 386
regretoverflow Avatar asked Jul 01 '15 19:07

regretoverflow


People also ask

Why does gsutil rsync not copy the entire file?

The gsutil rsync command copies changed files in their entirety and does not employ the rsync delta-transfer algorithm to transfer portions of a changed file. This is because cloud objects are immutable and no facility exists to read partial cloud object checksums or perform partial overwrites.

What would be a good addition to gsutil rsync?

What would be a good addition to gsutil rsync is to pass a list of file extensions that a header can be applied to during the rsync. For example, the only files that are normally gzip encoded are html, css,js,json,xml,svg,txt.

Why is gsutil rsync comparing file sizes with checksums not available?

If neither mtime nor checksums are available, gsutil rsync will resort to comparing file sizes. Checksums will not be available when comparing composite Cloud Storage objects with objects at a cloud provider that does not support CRC32C (which is the only checksum available for composite objects).

Does Google Cloud Storage (GCS) support gzip compression?

Google Cloud Storage(GCS) by default serves files uncompressed. There is however an option to enable gzipcompression for selected files. There is a catch though. It’s only possible to use static compression which means that the files need to be uploaded already compressed to the storage.


1 Answers

Assuming you're starting with gzipped source files in source-dir you can do:

gsutil -h content-encoding:gzip rsync -r source-dir gs://your-bucket

Note: If you do this and then run rsync in the reverse direction it will decompress and copy all the objects back down:

gsutil rsync -r gs://your-bucket source-dir 

which may not be what you want to happen. Basically, the safest way to use rsync is to simply synchronize objects as-is between source and destination, and not try to set content encodings on the objects.

like image 150
Mike Schwartz Avatar answered Sep 20 '22 10:09

Mike Schwartz