Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a limit to the length of metadata values in Google Cloud Storage?

When uploading a file to Google Cloud Storage, there is a custom data field metadata.

Google's example is fairly short:

var metadata = {
  contentType: 'application/x-font-ttf',
  metadata: {
    my: 'custom',
    properties: 'go here'
  }
};

file.setMetadata(metadata, function(err, apiResponse) {});

Is there a maximum for how big GCS will allow for the metadata object, should I wish to store manifests of tar and zip files, or a few hundred KB in there?

like image 268
Paul Avatar asked Apr 05 '17 04:04

Paul


People also ask

Is there a limit to Google Cloud Storage?

There is a maximum size limit for individual objects stored in Cloud Storage. This limit is 5 TiB. The maximum size of a single upload request is also 5 TiB. For uploads that would take a long time over your connection, consider using resumable uploads in order to recover from intermediate failures.

Which is the best storage type in GCP is for long term archives?

Coldline storage is a better choice than Standard storage or Nearline storage in scenarios where slightly lower availability, a 90-day minimum storage duration, and higher costs for data access are acceptable trade-offs for lowered at-rest storage costs.

What is metadata in storage?

Metadata is what the data in a file or sector header is in a storage device or system. Metadata is also the information that is used for indexing and searching content in a search engine and it is an important component in big data analytics. Fast access to metadata is a key to faster response in a storage system.

How long does log information remain in cloud logging?

Currently, charges for retaining logs past 30 days are not enforced. Cloud Logging retains logs according to retention rules applying to the log bucket type where the logs are held. You can configure Cloud Logging to retain logs between 1 day and 3650 days.


1 Answers

Using the following command to upload set metadata in GCS:

$ echo '{"metadata": {"large": "' > body ; tr -dC '[:print:]' < /dev/urandom | tr -d '\\"' | head -c SIZE_OF_METADATA_IN_BYTES >> body ;  echo '"}}' >> body; curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -X PATCH -H "Content-type: application/json" -d @body -o return_body https://www.googleapis.com/storage/v1/b/manifest-geode-357/o/empty

I Find that above 2097KB header the service returns "HTTP 413 Request Too Large" and metadata is not set. Below that level it is set as expected. If I use more compressible input (e.g. the output of yes), I can get more data, but the cut-off is at the same content-length value (which is post-compression). As 2097KB == 2MiB almost exactly, I expect the true limitation is the entire HTTP request must fit in 2MiB.


However Brandon's comment is correct: this is not a great idea for a whole catalog of reasons:

  1. This will cause you to consume more bandwidth (with associated performance and cost penalty)
  2. You won't save any money on storage costs (as metadata is still charged for).
  3. It relies on undocumented behaviour which might be changed by Google without any notice.
  4. Unlike real object data, there is no resumable-behaviour on upload, so errors hit you worse.
  5. There is no checksum to verify integrity during the upload process.
  6. It's likely that many client libraries store metadata in-memory rather than on disk or keep multiple copies, so you are more likely to see memory pressure in your application.

Simply storing the manifest in a separate object solves all of these issues. You could store thie location of the manifest in the metadata, and get the benefit of both options.

like image 178
David Avatar answered Jan 01 '23 21:01

David