Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NoSQL Database Structure for Facebook type social network

For a Facebook type social networking app, a high performing database structure is required, for storing data in Firebase(Cloud Firestore) (NoSQL).

Data to be stored :

 - Userinfo (name, email etc)
 - Friends
 - Posts
 - Comments on posts.

I am confused among the following two DB Structures regarding query performance (if the database becomes huge).

(Ref: C_xxx is Collection, D_xxx is document)

Structure 1

C_AllData
    - D_UserID-1
        name: xxxx,
        email: yyy,
        friends: [UserID-3, UserID-4]
        - C_Posts
            - D_PostId-1
                Text: hhh
                Date: zzz
                - C_Comments
                    - D_CommentId-1
                        UserID: 3
                        Text: kkk
                    - D_CommentId-2
                        UserID: 4
                        Text: kkk
            - D_PostId-2
                Text: hhh
                Date: zzz
                - C_Comments
                    - D_CommentId-3
                        UserID: 3
                        Text: kkk
                    - D_CommentId-4
                        UserID: 4
                        Text: kkk
    - D_UserID-2
        name: xxxx,
        email: yyy
        friends: [UserID-5, UserID-7]
        - C_Posts
            - D_PostId-3
                Text: hhh
                Date: zzz
                - C_Comments
                    - D_CommentId-5
                        UserID: 5
                        Text: kkk
                    - D_CommentId-6
                        UserID: 7
                        Text: kkk

Structure 2

C_AllUsers 
    - D_UserID-1
        name: xxxx,
        email: yyy
        friends: [UserID-3, UserID-4]
    - D_UserID-2
        name: xxxx,
        email: yyy
        friends: [UserID-5, UserID-7]

C_AllPosts
    - D_PostId-1
        UserID: 1
        Text: hhh
        Date: zzz
        - C_Comments
            - D_CommentId-1
                UserID: 3
                Text: kkk
            - D_CommentId-2
                UserID: 4
                Text: kkk
    - D_PostId-3
        UserID: 2
        Text: hhh
        Date: zzz
        - C_Comments
            - D_CommentId-5
                UserID: 5
                Text: kkk
            - D_CommentId-6
                UserID: 7
                Text: kkk

My Question is what are the pros and cons of the two approaches ?

Some points that i could think of are below, please correct me if i am wrong.

Structure 1 :

Is getting all the posts of a given user, Faster in Structure 1 ? Since we are pinpointing to the exact collection ( AllData/{UserID}/Posts/ )

Since entire DB is under one collection, is scalability not good ?

Structure 2 :

Divided DB -> Better Scalability ?

Divided DB -> Better Performance ?

Lesser Nesting -> Better Performance ?

AllPosts under one collection -> Slow querying ?


Or if you can suggest a better model, that would be great too.

like image 615
kernelman Avatar asked Sep 22 '18 08:09

kernelman


People also ask

Which type of NoSQL database is most suitable for modeling a social media network?

MongoDB open source document store model of IOgen [14] is the most widely used NoSQL database store model in SN's.

Which type of NoSQL database is best for storing data about social networks?

Graph-based NoSQL databases are widely used for social networks, spatial data, logistics, etc. Examples: Neo4J, Infinite Graph, OrientDB are some of the popular Graph-based databases.

Which type of database is used by social network?

The presented analysis of databases used by the most popular social network sites (Facebook, YouTube, Twitter, Instagram and LinkedIn) showed that most of them use combination of different databases, both relational and NoSQL, in order to resolve diverse needs and requirements of their users.

Which DB is best for social network?

Facebook uses MySQL. PostgreSQL is ACID compliant. This has its advantages. But MySQL and Postgres would be ideal of back-end database platforms.


1 Answers

In Firebase a rule of thumb is to keep separate entity types in separate branches. This is especially important because:

(Note: Here firebase is firebase realtime database)

  1. Firebase always loads complete nodes, and
  2. once you grant a user read access to a node, they have access to all data under that node.

For example in your first data structure, to load a list of friends, you will have to load all the posts of all friends, and all comments on all those posts too. That's a lot more data than is strictly needed, if all you wanted to do was show a list of the friends names.

In your second data structure, you are one step closer. As now you can first load the friends names, and then load their posts.

But even in that structure you have the same problem. If you want to display the list of post titles for a friend (or for all friends), you are going to have to load the entire posts and all comments. That is again way more data than is needed to show a list of post titles. So you'll definitely want to store the comments in a separate top-level list too, using the same key of the post to identify and group them.

C_AllPosts
    - D_PostId-1
        UserID: 1
        Text: hhh
        Date: zzz
    - D_PostId-3
        UserID: 2
        Text: hhh
        Date: zzz
C_AllComments
    - D_PostId-1
        - D_CommentId-1
            UserID: 3
            Text: kkk
        - D_CommentId-2
            UserID: 4
            Text: kkk
    - D_PostId-3
        - D_CommentId-5
            UserID: 5
            Text: kkk
        - D_CommentId-6
            UserID: 7
            Text: kkk

Now if you want to display a post and its comments, you will have to read two nodes. If you do this for multiple posts, you'll end up with a lot of reads, to essentially perform the NoSQL equivalent of a SQL JOIN. This is quite normal, it's essentially a client-side join, and it not nearly as slow as you may think, because Firebase pipelines the requests.

For some more introduction on this type of data modeling, I recommend:

  • this article on NoSQL data modeling
  • the Firebase blog post Denormalization is normal
  • this video series Firebase for SQL developers

And these answers to previous questions:

  • Many to Many relationship in Firebase
  • How would you model a collection of users and friends in Firebase?
  • Firebase data structure and url
  • https://stackoverflow.com/questions/16421179/whats-the-best-way-of-structuring-data-on-firebase/16423051?s=2|19.0394#16423051
  • https://stackoverflow.com/questions/30693785/how-to-write-denormalized-data-in-firebase/30699277?s=3|18.5624#30699277
  • https://stackoverflow.com/questions/43830610/how-to-denormalize-normalize-data-structure-for-firebase-realtime-database/43832677?s=5|16.2022#43832677
like image 122
Frank van Puffelen Avatar answered Oct 19 '22 13:10

Frank van Puffelen