Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Designing "social-feed" in DynamoDB

This question might be relevant for any document based NoSQL database.

I'm making some interest specific social network and decided to go with DynamoDB because of scalability and no-pain-administration factors. There are only two main entities in database: users and posts.

Requirement for common queries are very simple:

  • Home feed (feed of people I'm following)
  • My/User feed (feed of mine, or specific user feed)
  • List of user I/user followed
  • List of followers

Here is a database scheme I come up with so far (legend: __thisIsHashKey and _thisIsRangeKey):

timeline = { // post 
    __usarname:"totocaster",
    _date:"1245678901345",
    record_type:"collection",
    items: ["2d931510-d99f-494a-8c67-87feb05e1594","2d931510-d99f-494a-8c67-87feb05e1594","2d931510-d99f-494a-8c67-87feb05e1594","2d931510-d99f-494a-8c67-87feb05e1594","2d931510-d99f-494a-8c67-87feb05e1594"],
    number_of_likes:123,
    description:"Hello, this is cool"
} 

timeline = { // new follower 
    __usarname:"totocaster",
    _date:"1245678901345",
    type:"follow",
    follower:"tamuna123"
}

timeline = { // new like 
    __usarname:"totocaster",
    _date:"1245678901345",
    record_type:"like",
    liker:"tamuna123",
    like_date:"123255634567456"
}

users = {
    __username:"totocaster",
    avatar_url:"2d931510-d99f-494a-8c67-87feb05e1594",
    followers:["don_gio","tamuna123","barbie","mikecsharp","bassman"],
    following:["tamuna123","barbie","mikecsharp"],
    likes:[
    {
        username:'barbie',
        date:"123255634567456"
    },
    {
        username:"mikecsharp",
        date:"123255634567456"
    }],
    full_name:"Toto Tvalavadze",
    password:"Hashed Key",
    email:"[email protected]"
}

As you can see I came-up storing all my post directly in timeline collection. This way I can query for posts using date and username (hash and range keys). Everything seems fine, but here is the problem:

I can not query for User-Timeline in one go. This will be one of the most demanded queries by system and I can not provide efficient way to do this. Please help. Thanks.

like image 636
totocaster Avatar asked Jan 19 '13 11:01

totocaster


2 Answers

I happen to work with news feeds daily. (Author of Stream-Framework and founded getstream.io)

The most common solutions I see are:

  • Cassandra (Instagram)
  • Redis (expensive, but easy)
  • MongoDB
  • DynamoDB
  • RocksDB (Linkedin)

Most people use either fanout on write or fanout on read. This makes it easier to build a working solution, but it can get expensive quickly. Your best bet is to use a combination of those 2 approaches. So do a fanout on write in most cases, but for very popular feeds keep them in memory.

Stream-Framework is open source and supports Cassandra/Redis & Python

getstream.io is a hosted solution build on top of Go & Rocksdb.

If you do end up using DynamoDB be sure to setup the right partition key: https://shinesolutions.com/2016/06/27/a-deep-dive-into-dynamodb-partitions/

Also note that a Redis or DynamoDB based solution will get expensive pretty quickly. You'll get the lowest cost per user by leveraging Cassandra or RocksDB.

like image 128
Thierry Avatar answered Nov 02 '22 14:11

Thierry


I would check out the Titan graph database (http://thinkaurelius.github.com/titan/) and Neo4j (http://www.neo4j.org/).

I know Titan claims to scale pretty well with large data sets.

Ultimately I think your model maps well to a graph. Users and posts would be nodes, and then you can connect them arbitrarily via edges. A user (node) is a friend (edge) of another user (node).

A user (node) has many posts (nodes) in their timeline. Then you can run interesting traversals via the graph.

like image 1
ryan1234 Avatar answered Nov 02 '22 12:11

ryan1234