I have some product data that I need to store multiple versions of in a Redis cache. The data is made up of JSON-serialised objects. The process of obtaining the plain (basic) data is expensive, and the process of customising it into different versions is also expensive, so I'd like to cache all versions to optimise wherever possible. The data structure looks something like this:
BaseProduct
/\
/ \
/ \
/ \
/ \
CustomisedProductA CustomisedProductB
/ \ / \
CustomisedProductA1 CustomisedProductA2 CustomisedProductB1 CustomisedProductB2
The general idea here is:
The data is stored in this way because each step of the data retrieval/calculation process is expensive. The first time a particular product is retrieved for a region, there will be one set of customisations performed to make it into a region-specific product. The first time a particular product is retrieved for a store, I need to perform customisations based on the regional product to generate the store-specific product.
The problem comes in due to the fact that I may need to invalidate data in a few ways:
HDEL key field
(e.g. HDEL product CustomisedProductA:CustomisedProductA1
).My question is therefore: is there a way of representing this type of multi-level data structure, to allow for the performance of storing the data in multiple levels while enabling invalidation of only parts of the tree? Or, am I limited to expiring the entire tree (DEL key
) or specific nodes (HDEL key field
) but nothing in between?
There are at least 3 different ways for doing that, each has its own pros and cons.
The first approach is to use non-atomic ad-hoc scanning of the tree to identify and invalidate (delete) the tree's 2nd level (1st set of customizations). To do that, use a hierarichal naming scheme for your Hash's fields and iterate through them using HSCAN
. For example, assuming that your Hash's key name is the product's ID (e.g. ProductA), you'd use something like '0001:0001' as the field name for the first customization's first version, '0001:0002' for its second version and so forth. Similarly, '0002:0001' would be the 2nd customization 1st version, etc... Then, do find all of customization 42's versions, use HSCAN ProductA 0 MATCH 0042:*
, HDEL
the fields in the reply, and repeat until the cursor zeros.
The opposite approach is to proactively "index" each customization's versions so you can fetch them efficiently instead of performing the Hash's full scan. The way to go about that is using Redis' Sets - you keep a Set with all the field names for a given product's version. Versions can either be sequential (as in my example) or anything else as long as they are unique. The cost is maintaining these indices - whenever you add or remove a product's customization and/or version, you'll need to maintain consistency with these Sets. For example, the creation of a version would be something like:
HSET ProductA 0001:0001 "<customization 1 version 1 JSON payload"
SADD ProductA:0001 0001
Note that these two operations should be in a single transaction (i.e. use a MULTI\EXEC
block or EVAL
a Lua script). When you have this set up, invalidating a customization is just a matter of calling SMEMBERS
on the relevant Set and deleting the versions in it from the Hash (and the Set itself as well). It is important to note, however, that reading all members from a large Set could be time consuming - 1K members isn't that bad, but for larger Sets there's SSCAN
.
Lastly, you could consider using a Sorted Set instead of a Hash. While perhaps less intuitive in this use case, the Sorted Set will let you perform all the operations you need. The price for using it, however, is the increased complexity of O(logN) for adding/removing/reading compared to the Hash's O(1), but given the numbers the difference isn't significant.
To unleash the Sorted Set's power, you'll use lexicographical ordering so all of the Sorted Set's members should have the same score (e.g. use 0). Each product will be represented by a Sorted Set, just like with the Hash. The members of the Set are the equivalents of the Hash's field, namely customizations' versions. The "trick" is constructing the members in a way that allows you to perform range searches (or level-2 invalidations if you will). Here's an example of how it should look like (note that here the key ProductA isn't a Hash but a Sorted Set):
ZADD ProductA 0 0001:0001:<JSON>
To read a customization version, use ZRANGEBYLEX ProductA [0001:0001: [0001:0001:\xff
and split the JSON from the reply and to remove an entire customization, use ZREMRANGEBYLEX
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With