Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's an effective and processing-cheap algorithm for generating ETags?

I have a REST API (built in Nancy, running on ASP.NET) that can return a JSON object like this:

{
   id: "1",
   name: "Fred",
   reviews: [
     {
        id: "10",
        content: "I love Stack Overflow"
     }
   ]
}

Note how this object is not a direct entity, instead it's a representation.

Usually, i would use a last modified/timestamp of the entity in the DB as the ETag, then when it get's updated, the ETag gets updated. Simple.

But in this case, what if the user doesn't change, but the content of the first review changes? Using the aforementioned ETag logic, it wouldn't change. We have a case here where the representation includes multiple entities, and i'm trying to find a way to uniquely identify that.

So i need to somehow identify that representation (which is a simple C# POCO, stored in a Redis cache).

Here are my initial thoughts:

  • Object.GetHashCode(). Won't work, because the memory reference will always be different.
  • Memory stream the object, SHA1 hash it. Costly to do every time.
  • Before i add/update the cache, create a GUID to be used for the ETag and store that in the cache too. Then when the cache get's flushed (which it would be, in the previous example), a new GUID is generated and the ETag is updated. The problem with this approach is i'm tying my ETag mechanism to my caching implementation (so not loosely coupled).

Can anyone think of a cheap/effective way to do this, ideally at a global level? (e.g Object, or base object, instead of specific ETag generation logic for each entity/resource).

Many thanks!

like image 935
RPM1984 Avatar asked Nov 09 '22 04:11

RPM1984


1 Answers

I think the hashing approach is not so bad. There are extremelly efficient hash algoritms like MurmurHash3 (128 bit version) and xxHash (64 bit version) that I would consider. This is an effective way to do it gobally, but unfortunatelly it's not the cheapest. You can find c# implementations here and here.

You said that each entity in the database has a modified timestamp. If the model is composed by several entities, the model ETag could be derived from the entities timestamps. The model ETag would be the concatenation of the entities timestamps. This approach is more efficient but you cannot do it gobally, you would need to write specific code for each model.

like image 95
Jesús López Avatar answered Nov 15 '22 06:11

Jesús López