This question might be relevant for any document based NoSQL database.
I'm making some interest specific social network and decided to go with DynamoDB because of scalability and no-pain-administration factors. There are only two main entities in database: users and posts.
Requirement for common queries are very simple:
Here is a database scheme I come up with so far (legend: __thisIsHashKey
and _thisIsRangeKey
):
timeline = { // post
__usarname:"totocaster",
_date:"1245678901345",
record_type:"collection",
items: ["2d931510-d99f-494a-8c67-87feb05e1594","2d931510-d99f-494a-8c67-87feb05e1594","2d931510-d99f-494a-8c67-87feb05e1594","2d931510-d99f-494a-8c67-87feb05e1594","2d931510-d99f-494a-8c67-87feb05e1594"],
number_of_likes:123,
description:"Hello, this is cool"
}
timeline = { // new follower
__usarname:"totocaster",
_date:"1245678901345",
type:"follow",
follower:"tamuna123"
}
timeline = { // new like
__usarname:"totocaster",
_date:"1245678901345",
record_type:"like",
liker:"tamuna123",
like_date:"123255634567456"
}
users = {
__username:"totocaster",
avatar_url:"2d931510-d99f-494a-8c67-87feb05e1594",
followers:["don_gio","tamuna123","barbie","mikecsharp","bassman"],
following:["tamuna123","barbie","mikecsharp"],
likes:[
{
username:'barbie',
date:"123255634567456"
},
{
username:"mikecsharp",
date:"123255634567456"
}],
full_name:"Toto Tvalavadze",
password:"Hashed Key",
email:"[email protected]"
}
As you can see I came-up storing all my post directly in timeline collection. This way I can query for posts using date and username (hash and range keys). Everything seems fine, but here is the problem:
I can not query for User-Timeline in one go. This will be one of the most demanded queries by system and I can not provide efficient way to do this. Please help. Thanks.
I happen to work with news feeds daily. (Author of Stream-Framework and founded getstream.io)
The most common solutions I see are:
Most people use either fanout on write or fanout on read. This makes it easier to build a working solution, but it can get expensive quickly. Your best bet is to use a combination of those 2 approaches. So do a fanout on write in most cases, but for very popular feeds keep them in memory.
Stream-Framework is open source and supports Cassandra/Redis & Python
getstream.io is a hosted solution build on top of Go & Rocksdb.
If you do end up using DynamoDB be sure to setup the right partition key: https://shinesolutions.com/2016/06/27/a-deep-dive-into-dynamodb-partitions/
Also note that a Redis or DynamoDB based solution will get expensive pretty quickly. You'll get the lowest cost per user by leveraging Cassandra or RocksDB.
I would check out the Titan graph database (http://thinkaurelius.github.com/titan/) and Neo4j (http://www.neo4j.org/).
I know Titan claims to scale pretty well with large data sets.
Ultimately I think your model maps well to a graph. Users and posts would be nodes, and then you can connect them arbitrarily via edges. A user (node) is a friend (edge) of another user (node).
A user (node) has many posts (nodes) in their timeline. Then you can run interesting traversals via the graph.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With