I'm looking for a way to include small pieces of data (from my server) with objects during the upload process (e.g. User ID, File ID, etc). After looking at S3 documentation, I'm not sure whether it's more appropriate to include this data as object tags or object metadata.
Is the purpose of tags for categorization? And metadata for per-object data?
What are the differences? What do you think would be more appropriate for this situation?
Metadata are mainly used for defining extra information for entities while tags are used for organizing entities. You can also create tags based on metadata. In general, although tags and metadata are closely related, they are different concepts and are created and used in a different way.
If you want to add your own custom metadata to an object in S3, you can add metadata instead of tags. Tags are not the same thing as object metadata in S3. Metadata applies only to that object in S3 and cannot be searched on, as you can search with tags.
Metadata identifies properties of the object, as well as specifies how the object should be handled when it's accessed. Metadata exists as key:value pairs. For example, the storage class of an object is represented by the metadata entry storageClass:STANDARD .
NET or the Amazon S3 console. Object tagging gives you a way to categorize storage. Each tag is a key-value pair that adheres to the following rules: You can associate up to 10 tags with an object. Tags associated with an object must have unique tag keys.
Both metadata and tags are essentially "metadata" but there are important differences in how they can (or can't) be used to modify the behavior of the service and how their values can (or can't) be accessed.
An object in S3, including its metadata, is -- strictly speaking -- immutable. The console gives you the ability to "edit" metadata, but that's not a precise description of what's happening. When you edit an object's metadata, you are actually overwriting the object with a copy of itself, with its metadata modified. If the bucket is versioned, you now have two copies of the object with two different dates and modified metadata.
Tags are a "subresource" -- in a sense, "off to the side" of an object -- they are managed separately and can be modified without modifying the object itself.
Metadata is included in the PUT
request as HTTP headers when the object is created. Tags are stored by sending a second request. Full support for tags up to the count and size limits, below, requires sending a second request to the ?tagging
subresource on the API endpoint, but the PUT
(Object) REST call also has limited support for tags, allowing up to 2K of url-encoded, query parameter-style tag keys and values to be submitted in a single x-amz-tagging
HTTP PUT
request header. For example, x-amz-tagging: hipaa_restrict=false&pci_restrict=true&owner=Accounting%20and%20Payroll
. The documentation is unclear with regard to whether the 2K includes the byte length of the header name, itself, or whether this 2K is the same 2K as the x-amz-meta-*
user metadata tags. Presumably, it's two different 2K limits, but the 2K tag limit likely includes the url-encoded form of the keys and values, as well as the length of the header.
You can control, separately via policy, whether an IAM user can read or write objects+metadata or tags. Objects and metadata are handled together in permissions (if you can do one, you can always to the other) but tags are separate permissions.
When you GET
an object, the actual metadata is returned in the HTTP response headers. This means a user downloading an object can see the metadata if they know how to inspect the HTTP headers.
Conversely, tags are not returned in the headers in response to a GET
request; instead, only the x-amz-tagging-count:
header is returned, reporting the number of tags on the object if it is non-zero. Note, however, that while tags are more appropriate for storing proprietary data, they are not appropriate for storing unencrypted sensitive data.
The total of all metadata keys and values for each object is limited to 2KB. Note that the limit is expressed in bytes, so multibyte characters consume more than one byte per character toward the limit. There is no limit on the number of metadata keys -- only the 2KB total limit for user metadata. Only US-ASCII characters are fully supported in object metadata keys and values and metadata must be comprised of characters that are valid as HTTP headers, since that's how object metadata is sent.
The limits on tags are different. Each object can have up to 10 tags, each tag key is limited to 128 characters (not bytes), and each tag value is limited to 256 characters (not bytes), although the limits are lower, as noted above, when the tags ride along with the PUT
request. Unlike metadata, tags support UTF-8.
Metadata keys and values are counted as billable bytes contributing to the billed size of object storage. Tags are billed separately with a different forumula.
Neither tags nor metadata can be used for "scanning" objects. It is not possible to ask the S3 service for a list of objects with specific tags or with specific metadata.
Tags can be used to modify the behavior of the service in at least two important ways that metadata cannot (and, in fact, here may be others that I'm not thinking of at the moment):
IAM policies on buckets/users/roles can test tag values for access control purposes, but cannot test metadata values.
There are IAM policy condition keys that allow access control on objects based on tags. There are no similar access control features based on metadata.
Bucket lifecycle policies can test tag values but not metadata values.
Lifecycle policies can be used to modify an object's storage class (to standard/infrequent-access or glacier) or purge objects or versions after a configurable time interval. Before the introduction of object tags, these rules applied either to the entire bucket or to a certain prefix, such as images/
. Now, tags allow lifecycle policies to be applied based on object tags, so (for example) transient data can be mixed with permament data while applying lifecycle policies differently without the need to store the objects in different key hierarchies for prefix matching.
In the situation described in the question, I would be inclined to store these values in metadata unless the fact that they are visible in HTTP response headers is something you see as a security concern.
If you are using S3 in conjunction with CloudFront, you can use a Lambda@Edge Origin Response trigger to redact or delete the object metadata from responses in-flight so they are not visible to the browser. An Origin Response trigger is a Lambda function written in Node.js that can programmatically modify responses before they are stored in the CloudFront cache, which means it only needs to run on cache misses. Similar functionality can also be accomplished by routing requests to the bucket through a proxy server in EC2 such as HAProxy or Nginx, but not if the bucket is accessed directly. The S3 service will always return the metadata in the HTTP response headers, but it only returns a count of tags (if the object has tags) and not the tags themselves, when an object is downloaded.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With