I'm hosting publicly available static resources in a google storage bucket, and I want to use the gsutil rsync
command to sync our local version to the bucket, saving bandwidth and time. Part of our build process is to pre-gzip these resources, but gsutil rsync
has no way to set the Content-Encoding
header. This means we must run gsutil rsync
, then immediately run gsutil setmeta
to set headers on all the of gzipped file types. This leaves the bucket in a BAD state until that header is set. Another option is to use gsutil cp
, passing the -z option, but this requires us to re-upload the entire directory structure every time, and this includes a LOT of image files and other non-gzipped resources that wastes time and bandwidth.
Is there an atomic way to accomplish the rsync and set proper Content-Encoding headers?
The gsutil rsync command copies changed files in their entirety and does not employ the rsync delta-transfer algorithm to transfer portions of a changed file. This is because cloud objects are immutable and no facility exists to read partial cloud object checksums or perform partial overwrites.
What would be a good addition to gsutil rsync is to pass a list of file extensions that a header can be applied to during the rsync. For example, the only files that are normally gzip encoded are html, css,js,json,xml,svg,txt.
If neither mtime nor checksums are available, gsutil rsync will resort to comparing file sizes. Checksums will not be available when comparing composite Cloud Storage objects with objects at a cloud provider that does not support CRC32C (which is the only checksum available for composite objects).
Google Cloud Storage(GCS) by default serves files uncompressed. There is however an option to enable gzipcompression for selected files. There is a catch though. It’s only possible to use static compression which means that the files need to be uploaded already compressed to the storage.
Assuming you're starting with gzipped source files in source-dir you can do:
gsutil -h content-encoding:gzip rsync -r source-dir gs://your-bucket
Note: If you do this and then run rsync in the reverse direction it will decompress and copy all the objects back down:
gsutil rsync -r gs://your-bucket source-dir
which may not be what you want to happen. Basically, the safest way to use rsync is to simply synchronize objects as-is between source and destination, and not try to set content encodings on the objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With