Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how does a etag work in expressjs

Expressjs automatically send etags. I would like to know how the etag is generated..is it based on the content that is generated dynamically by the get routine. or is there way I can mainpulate it, by not even going through the process of generating the content(dynamic content - from DB) and pass back etag as same.

may be a middleware which start with just checking if it is valid session id and pass back the same etag that the client gives or may be based of the url + session id..that way it will be unique. and end the request there rather going through the whole db call and all those stuff. in which case I would need to know the client is making a 304 call.

I could go with the expires tag.but when the session is over. if somebody is opening the url it should not allow. so I am thinking the etag should be based of the session id as well. how does if modified can work in this dynamic content scenario. can it be used.

like image 489
coool Avatar asked Jul 03 '14 00:07

coool


People also ask

How do ETag headers work?

An ETag (entity tag) is an HTTP header that is used to validate that the client (such as a mobile device) has the most recent version of a record. When a GET request is made, the ETag is returned as a response header. The ETag also allows the client to make conditional requests.

What does ETag do?

The ETag (or entity tag) HTTP response header is an identifier for a specific version of a resource. It lets caches be more efficient and save bandwidth, as a web server does not need to resend a full response if the content was not changed.

What is ETag in REST API?

An entity tag, or ETag, is a mechanism that is provided by the HTTP protocol so that a browser client or a script can make conditional REST requests for optimistic updating or optimized retrieval of entities.

How does ExpressJS work?

Express provides methods to specify what function is called for a particular HTTP verb ( GET , POST , SET , etc.) and URL pattern ("Route"), and methods to specify what template ("view") engine is used, where template files are located, and what template to use to render a response.


2 Answers

At the time of writing (8th July 2014), weak ETags are generated using CRC32 (source) and strong ETags are generated using MD5 (source).

Based on what one of the contributors to Express says, you can specify whether to use the strong or weak ETags by:

app.enable('etag') // use strong etags app.set('etag', 'strong') // same app.set('etag', 'weak') // weak etags 

It looks like you can also specify your own custom function to do the ETags like so:

app.set('etag', function(body, encoding){ /* return valid etag */ }); 

The NPM package fresh is also worth looking at, as it's used in Express for freshness checking (source1, source2).

As for your application, remember that you can override any response headers e.g. res.set('etag', 'my-awesome-etag-value') before invoking res.send() (or similar function). Further discussion (including advantages and disadvantages) can be found here: https://github.com/visionmedia/express/issues/2129#issue-34053148

like image 162
stellarchariot Avatar answered Sep 21 '22 19:09

stellarchariot


Let me explain it in 2021, with updated information and links to the code.

It's a relatively straightforward and simple (no rocket science) concept, but, at the same, a very tricky thing that as a developer you should really know before it comes to bite you!

What is Etag?

So, Etag (per Wikipedia/Etag), is a HTTP header.

It can be seen on the "Response Headers" section of some GET calls in the DevTools, like the screenshot below.

enter image description here

In Express, it can start with W/ (weak, default) or not (strong), and then <LEN>-<VALUE>, where VALUE is 27 characters long, and LEN is the length of VALUE in hex. (Source code in June 2021)

What's the purpose of Etag?

Ah, good question. The answer is: Caching!

(PS. And only caching of the Network Traffic between the client and the server. That's the transmission of the response data, being sent over HTTP(S) to the client; not any sort of internal caching of Server to DB or what not.)

Caching, how?

The mechanism is relatively simple.

Let's say a client (browser, like Chrome) makes a call to https://myserver.com/user/profile/get endpoint and gets a big JSON response of all the profile data of the current user (say, 30 fields of name, phone, photo URL, blah, blah). Besides handing the response, as a JSON object, to your application, the client, in its own private internal Network layer, will store this data in a client-side cache of {'https://myserver.com/users/profile/get': <this-json-response-object> }.

Now, the next time (even days and sessions later) the client is about to make a call to the same endpoint of .../user/profile/get, it can tell the server that "Hey, I have this <previous_json_from_the_cache> in my cache, so don't send it over if what you are going to send is exactly this."

Cool, but isn't that inefficient?

It is!

The problem is if the client sends the entire JSON object from the cache, in the request to the server, it's both a security risk, and quite inefficient -- the same 30-field JSON object is sent over the network, even maybe twice!

What happens here is, the client (i.e. Chrome browser) can compute a hash (say MD5, which is both non-reversible and shorter) and in the second request say "Hey, if MD5 hash of the JSON you are going to send me back is this <computed_hash>, I already have it! so don't send it over."

Now, what happens is, the server is going to compute the response (pull from DB and everything), exactly as before. But, ONLY RIGHT BEFORE SENDING THE RESPONSE DATA, it computes the hash value of the response (in server side) to see if it matches what client has said it already has. If so, it sends a 304 HTTP status response code, instead of 200, which means "nothing is changed."

Nice! Is it exactly this?

Well, in the example above, if you pay close attention, the Hash Computation is happening both in client side and in the server side. It would make it hard to change the algorithm, at least. So, in reality, the "hash of the response" is actually computed only in the Server Side in the first time as well, and will be sent back to the Client.

This computed hash of "the current response," which comes back with the response, is in ETag header of the response.

With that, whenever client receives a response, it will store: { ".../profile/get": [<ETag>, <JSON-Response-Data>] } in its internal cache.

Then, in any future requests, the client will send this ETag value to the server (in some headers like if-none-match), to imply that it can receive a 304 if the new call's response is going to have an ETag of this.

So, to recap:

  • ETag value is nothing crazy, but a non-reversible, short, and fast hashed value of the Response Data (body).
  • Server sends ETag header in Response to Client.
  • Client sends if-none-matched header (with its value being previously received Etag values from server) in Request to Server.

Great! How can I use it?

By default, it's happening in Express.js. So, sit back and enjoy!

It's very unlikely that you need to mess up with its settings.

When should I ever NOT use Etag?

Ah! Welcome to my life. :D That's how I got here and did all this research.

Express package uses the etag package (it's just one file, managed by the same guys) to generate the ETag value. Internally, the etag package uses sha1 encryption of the body, and nothing crazy, to keep the performance at its best. (If you imagine, this function will be called a lot! At least once or twice on average per any GET call the server receives and processes.)

To decide whether it should do a 304 or 200, when client has said "I have these values in my cache already", Express uses fresh package (again only one file, in fact just one function returning a boolean, maintained by the same guys). Internally, the fresh packages reads if-none-matched tag of the request headers (reqHeaders['if-none-match']) and compares it with the etag of the response (resHeaders['etag']) that it's about to send out.

Cool, what's the problem then?

The problem arises when your architecture and the communications between client and server rely on custom headers!

For instance, you want to renew the auth or session token on any request, and refresh it in the background and send a new one, as a RESPONSE HEADER on some requests.

CURRENT Etag IMPLEMENTATION OF EXPRESS, ONLY RELIES ON RESPONSE BODY, AND NOT RESPONSE HEADERS. Even, the custom function that they allow putting in place (doc, code) is only taking body content, and not response headers.

So, what happens is, when the response (e.g. profile data) is not changed, your client might reuse an outdated auth-token and kick the user out due to an invalid auth/session tag!

How can I disable it?

You can do app.set("etag", false); so Express stops sending it. Per this answer, you can/should also use nocache via app.use(nocache()) to also send "Hey Client, don't ever bother yourself caching it!" headers to the client, from the server.

Cheers!

PS. Final Notes:

  • If you think about it, ETags are very valuable for assets (when the size of the response data is like 100KB or more), but not for common API Endpoints data. So, disabling it for your small-response endpoints might not be a bad idea -- it might worth not paying the overhead, actually.
like image 37
Aidin Avatar answered Sep 23 '22 19:09

Aidin