Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching strategy for personalized feeds

Let's say a user can subscribe to other users' posts, tags, or any other similar criteria he may want.

On his feed an app returns the 'main feed' that's same between the users, and also serves the feed items depending on his 'subscriptions' criteria (the feed is served through API).

The feed data is one kind of entity (posts). And that feed is in infinite-scrolled (paginated), which adds extra complexity.

If the feed is same among users, caching is trivial, but in case of personalized feed, I just can't think what would be the best way to do it.

Each 'page' is offset'ed by date range (for certain day).

One of approaches I can think of is:

'same feed' part is cached by date key (some key representing date range).

Personalized posts feed items are to be cached individually. Then I keep arrays of post id's depending on criteria e.g. authoring-user, or tag it is assigned to like (User#1: [10,15,23,64 ...], Tag#FOO: [1,2,5,10 ...]), and also delimiter them by date range (according to pagination part they fit to), and then fetch those posts via mget/getMulti by ids from Redis or Memcahed and return the combined result.

But that approach feels somewhat 'unright' for me for the reason it is so complex. Or, Is using fine tuned DB (let's say running in RAM, or fully buffered in it) without caching - viable in such situation (rendering/serializing time is unimportant as I pass it almost raw to client)?

I seek for platform/caching layer agnostic general strategy advice.

like image 929
ClassyPimp Avatar asked Oct 29 '22 14:10

ClassyPimp


1 Answers

The following design might be a better way of doing it.

Query processor layer: Usually, this will be a REST API which takes the query and returns post feed(paginated by date or post counts etc). This will search the posts storage(Database, indexed storages like solr etc) and gets list of post IDs only[Note: don't load all the posts, just their IDs.

Posts service layer Query processor layer will use this service layer to get all the posts given their IDs. First, it contacts cache-service layer asking for posts with IDs. If it is not found their, then get it will load the post from storage and return it to query-processor. Also, it will send the loaded post to cache-service layer to cache it for future use.

Cache service layer Given a post ID, it will return the post only if it is present in the cache.

Now, cache keys for posts will help you in speeding up the post retrieval time.

EG: Redis provides you pattern matching for keys. So, using a key with format postId:date:userId:tag1,tag2 you can one post or fetch all the posts within a date range, with a tag or userId etc very easily.

like image 69
code Avatar answered Nov 12 '22 22:11

code