I am deciding to use MySql or NoSQL for a forumn site. I am very new to the whole NoSQL idea and reading the documentation I have noticed that an 'item' can't be larger than 64kb this includes attributes and values. From the way I understand it the only way I can think of implementing this would be something like this:
Posts: [
{ user: 'gates', text: 'first post'
comment: [
{ user: 'jim', text: 'comment1',
user: 'bobby', text:'comment2'
}
]
},
{ user: 'jim', text: 'second post' }
]
The problem here is if there are a lot of comments on a post, it would exceed the 64kb limit for the item. Is there another of way of implementing this "post/comment relationship" which would allow to scale nicely?
You can model it in many ways, the best two i can think of are:
Use one table (let say "forumapp"):
Post
HASH -> <uuid> RANGE -> 0
Comment
HASH -> <uuid of the post> RANGE -> 1..n (comments will be ordered by RANGE Key by default)
now querying with the hash you can get the post and all its comments. Because they have different ranges they are different items, so you can have how many comments you want.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
No practical limit for non-indexed tables.
For a table with local secondary indexes, there is a limit on item collection sizes: For every distinct hash key value, the total sizes of all table and index items cannot exceed 10 GB. Depending on your item sizes, this may constrain the number of range keys per hash value. For more information, see Item Collection Size Limit.
Use 2 tables
Table post
HASH -> uuid
Table Comments
HASH -> RANGE -> comment date (so you get results ordered by date)
The difference between the two approaches is that with the first you issue 1 Query operation specifying only the hash and you get your post together with all the comments. (As long as the whole thing is <1MB, then you have to get the next 1MB with another api call and so on). While with the second approach you need to perform 1 GetItem operation for the post and 1 Query operation for comments. Since you have only 5 secondary indexes available for 1 table you may want to have 2 tables. But generally it really depends on what access you think your application will do on the data...
To better understand the differences between SQL and NoSQL, is worth the time to check, this Martin Fowler presentation: https://www.youtube.com/watch?v=qI_g07C_Q5I
If you are used to SQL it might be tricky to get an app working on a NoSQL database, because you have to think seriously on how you will be retrieving your data BEFORE you store it, or you will simply not be able to read it as you like (no SQL to rescue..). Some times you will have to build your own indexes in a separate table, slow down a bit writes to make reads faster. You should be doing it even with SQL databases but if you fail to model a relational model you will get at best a slow application, if you fail to model a NoSQL model you will get stuck.
For instance if you have to retrieve all the posts that have been tagged with tag A and tag B, with a single query operation, you will have to maintain some kind of index in another table that given an hash calculated on A+B will give you all the ranges (post identifiers) that match your query. You might be thinking of secondary indexes, but you can only use them INSIDE an hash value. So you first give the hash and only then you filter with a secondary index...
If you are using nodejs checkout: https://github.com/aaaristo/dyngodb.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With