Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing news feed on GAE - Should I use Prospective Search?

I have an issue i'm struggling with for some time now. Im trying to implement a news feed feature in my app using GAE cloud endpoints and java. The common concept is of followers and followees, where an action of a followee can be seen by his followers. A new follower should also see his followees past actions, not only from the time he started following.

I made a few tries with the following components. Each try worked great but was lacking something:

  1. On each user action i added a 'log' entity into the datastore with the user id included. When a user was displaying his news feed i just queried for all those entities by their user ids according to the user's followees list. Everything was fine until i realized that a 'IN' query cannot be cursored. So this option was gone.
  2. On this try, which is also the current state of the application, im using the Search API. Upon every user action im not storing a 'log' entity into the datastore anymore but a document into a search index. Complex queries can be cursored here and the world is smiling again. But... im not too sure that, billing wise, this is a smart descision. It seems that the costs of searching/adding/deleting documents along side the documented daily limitations is making the whole thing a bit too sketchy.
  3. The next try should be Prospective Search API. From what i'm reading in the documents it seems the right component to pick for that purpose. Unfortunately, the documentation is really poor and give very little examples. Also the billing information is unclear.

So im asking for the advice of the stackoverflow community. Can you please advise me about this matter ? and if Prospective Search is the right option to choose, can you please provide some clear sample java code that uses cloud endpoints?

EDIT : Just to emphasize the main design requirement here - The news feed feature need to have the ability to fetch sorted followees actions using a cursor (in order avoid querying the whole batch).

like image 409
AsafK Avatar asked Dec 13 '13 01:12

AsafK


1 Answers

Use a pull-aggregate-per-follower model: periodically (or on demand) query all followees actions once and then cache them inside a dedicated per-follower entity. Remember the time of last query, so next time you just query from that point on (assuming actions can not be added/changed to the past times).

This will give you the following features (and limitations):

  1. If query is on-demand, than you will not need to query for users that are inactive.
  2. Since the query is "new-only" (looks for new actions only), it would cost you nothing if it returned zero results.
  3. You will only query each followee actions per follower once. After that all recent actions would be cached inside one entity and loaded into memory with one get. This should be a substantial cost and time saving.
  4. You could sort/filter actions in memory any way you wish.

Limitations:

  1. Entities have a 1MB limit, so there is a max no of actions that you can cache in one entity. So you will either need to limit caching of recent actions per user or spread out action caching over multiple entities.
  2. You will need to use IN query over followees (max 30) and also use parallel threads to achieve decent performance. This could easily hit 3-5 seconds when querying over 1000-2000 followees. Also, you could easily hit RPC limit (aka max concurrent API calls) per instance when serving multiple users at the same time.
like image 116
Peter Knego Avatar answered Oct 06 '22 14:10

Peter Knego