For a Facebook type social networking app, a high performing database structure is required, for storing data in Firebase(Cloud Firestore) (NoSQL).
Data to be stored :
- Userinfo (name, email etc)
- Friends
- Posts
- Comments on posts.
I am confused among the following two DB Structures regarding query performance (if the database becomes huge).
(Ref: C_xxx is Collection, D_xxx is document)
Structure 1
C_AllData
- D_UserID-1
name: xxxx,
email: yyy,
friends: [UserID-3, UserID-4]
- C_Posts
- D_PostId-1
Text: hhh
Date: zzz
- C_Comments
- D_CommentId-1
UserID: 3
Text: kkk
- D_CommentId-2
UserID: 4
Text: kkk
- D_PostId-2
Text: hhh
Date: zzz
- C_Comments
- D_CommentId-3
UserID: 3
Text: kkk
- D_CommentId-4
UserID: 4
Text: kkk
- D_UserID-2
name: xxxx,
email: yyy
friends: [UserID-5, UserID-7]
- C_Posts
- D_PostId-3
Text: hhh
Date: zzz
- C_Comments
- D_CommentId-5
UserID: 5
Text: kkk
- D_CommentId-6
UserID: 7
Text: kkk
Structure 2
C_AllUsers
- D_UserID-1
name: xxxx,
email: yyy
friends: [UserID-3, UserID-4]
- D_UserID-2
name: xxxx,
email: yyy
friends: [UserID-5, UserID-7]
C_AllPosts
- D_PostId-1
UserID: 1
Text: hhh
Date: zzz
- C_Comments
- D_CommentId-1
UserID: 3
Text: kkk
- D_CommentId-2
UserID: 4
Text: kkk
- D_PostId-3
UserID: 2
Text: hhh
Date: zzz
- C_Comments
- D_CommentId-5
UserID: 5
Text: kkk
- D_CommentId-6
UserID: 7
Text: kkk
My Question is what are the pros and cons of the two approaches ?
Some points that i could think of are below, please correct me if i am wrong.
Structure 1 :
Is getting all the posts of a given user, Faster in Structure 1 ? Since we are pinpointing to the exact collection ( AllData/{UserID}/Posts/ )
Since entire DB is under one collection, is scalability not good ?
Structure 2 :
Divided DB -> Better Scalability ?
Divided DB -> Better Performance ?
Lesser Nesting -> Better Performance ?
AllPosts under one collection -> Slow querying ?
Or if you can suggest a better model, that would be great too.
MongoDB open source document store model of IOgen [14] is the most widely used NoSQL database store model in SN's.
Graph-based NoSQL databases are widely used for social networks, spatial data, logistics, etc. Examples: Neo4J, Infinite Graph, OrientDB are some of the popular Graph-based databases.
The presented analysis of databases used by the most popular social network sites (Facebook, YouTube, Twitter, Instagram and LinkedIn) showed that most of them use combination of different databases, both relational and NoSQL, in order to resolve diverse needs and requirements of their users.
Facebook uses MySQL. PostgreSQL is ACID compliant. This has its advantages. But MySQL and Postgres would be ideal of back-end database platforms.
In Firebase a rule of thumb is to keep separate entity types in separate branches. This is especially important because:
(Note: Here firebase is firebase realtime database)
For example in your first data structure, to load a list of friends, you will have to load all the posts of all friends, and all comments on all those posts too. That's a lot more data than is strictly needed, if all you wanted to do was show a list of the friends names.
In your second data structure, you are one step closer. As now you can first load the friends names, and then load their posts.
But even in that structure you have the same problem. If you want to display the list of post titles for a friend (or for all friends), you are going to have to load the entire posts and all comments. That is again way more data than is needed to show a list of post titles. So you'll definitely want to store the comments in a separate top-level list too, using the same key of the post to identify and group them.
C_AllPosts
- D_PostId-1
UserID: 1
Text: hhh
Date: zzz
- D_PostId-3
UserID: 2
Text: hhh
Date: zzz
C_AllComments
- D_PostId-1
- D_CommentId-1
UserID: 3
Text: kkk
- D_CommentId-2
UserID: 4
Text: kkk
- D_PostId-3
- D_CommentId-5
UserID: 5
Text: kkk
- D_CommentId-6
UserID: 7
Text: kkk
Now if you want to display a post and its comments, you will have to read two nodes. If you do this for multiple posts, you'll end up with a lot of reads, to essentially perform the NoSQL equivalent of a SQL JOIN
. This is quite normal, it's essentially a client-side join, and it not nearly as slow as you may think, because Firebase pipelines the requests.
For some more introduction on this type of data modeling, I recommend:
And these answers to previous questions:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With