Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is a more optimal Firestore schema for getting a Social Media feed?

I'm toying with several ideas for using Firestore for a social media feed. So far, the ideas I've had haven't panned out, so for this one I'm hoping to get the community's feedback.

The idea is to allow users to post information, or to record their activity, and to any user following/subscribed to that information, display it. The posts information would be in a root collection called posts.

The approaches, as far as I can tell, require roughly the same number of reads and writes.

One idea is to have within the users/{userId} have a field called posts which is an array of documentIds that I'm interested in pulling for the user. This would allow me to pull directly from posts and get the most up-to-date version of the data.

Another approach seems more Firebasey which is to store documents within users/{userId}/feeds that are copies of the posts themselves. I can use the same postID as the data in posts. Presumably, if I need to update the data for any review, I can use a group collection query to get all collections called feeds, where the docID is equal (or just create a field to do a proper "where", "==", docId).

Third approach is all about updating the list of people who should view the posts. This seems better as long as the list of posts is shorter than the lists of followers. Instead of maintaining all posts on every follower, you're maintaining all followers on each post. For every new follower, you need to update all posts.

This list would not be a user's own posts. Instead it would be a list of all the posts to show that user.

Three challengers:

  1. users/{userId} with field called feed - an array of doc Ids that point to the global posts. Get that feed, get all docs by ID. Every array would need to be updated for every single follower each time a user has activity.

    users (coll)
        -> uid (doc)
        -> uid.feed: postId1, postId2, postId3, ...] (field)
    
    posts (coll)
        -> postId (doc)
    

Query (pseudo):

doc(users/{uid}).get(doc)
    feed = doc.feed
    for postId in feed:
        doc(posts/{postId}).get(doc)
  1. users/{userId}/feed which has a copy of all posts that you would want this user to see. Every activity/post would need to be added to every relevant feed list.

    users (coll)
        -> uid (doc)
             -> feed: (coll)
                   -> postId1 (doc)
                   -> postId2
                   -> postId3
    
    posts (coll)
        -> postId (doc)
    

Query (pseudo):

collection(users/{uid}/feed).get(docs)
    for post in docs:
        doc(posts/{post}).get(doc)
  1. users/{userId}/feed which has a copy of all posts that you would want this user to see. Every activity/post would need to be added to every relevant feed list.

    users (coll)
        -> uid (doc)
    
    
    posts (coll)
        -> postId (doc)
        -> postId.followers_array[followerId, followerId2, ...] (field)
    

Query (pseudo):

collection(posts).where(followers, 'array_contains', uid).get(docs)

Reads/Writes

1. Updating the Data For the author user of every activity, find all users following that user. Currently, the users are stored as documents in a collection, so this is followerNumber document reads. For each of the users, update their array by prepending the postId this would be followerNumber document writes.

1. Displaying the Data/Feed For each fetch of the feed: get array from user document (1 doc read). For each postId, call, posts/{postId}

This would be numberOfPostsCalled document reads.

2. Updating the Data For the author user of every activity, find all users following that user. Currently, the users are stored as documents in a collection, so this is followerNumber document reads. For each of the users, add a new document with ID postId to users/{userId}/feed this would be followerNumber document writes.

2. Displaying the Data/Feed For each fetch of the feed: get a certain number of posts from users/{userId}/feed

This would be numberOfPostsCalled document reads.

This second approach requires me to keep all of the documents up to date in the event of an edit. So despite this approach seeming more firebase-esque, the approach of holding a postId and fetching that directly seems slightly more logical.

3. Updating the Data For every new follower, each post authored by the person being followed needs to be updated. The new follower is appended to an array called followers.

3. Displaying the Data For each fetch of the feed: get a certain number of posts from posts where uid == viewerUid

like image 374
Thingamajig Avatar asked Nov 07 '22 16:11

Thingamajig


1 Answers

Nice, when I talk about what is more optimal I really need a point or a quality attribute to compare, I' will assume you care about speed (not necessary performance) and costs.

This is how I would solve the problem, it involves several collections but my goal is 1 query only.

user (col)

{
 "abc": {},
 "qwe": {}
}

posts (col)

{
  "123": {},
  "456": {}
}

users_posts (col)

{
  "abc": {
    "posts_ids": ["123"]
  }
}

So far so good, the problem is, I need to do several queries to get all the posts information... This is where cloud functions get into the game. You can create a 4th collection where you can pre-calculate your feed

users_dashboard

{
  "abc": {
    posts: [
    {
       id: "123", /.../
    }, {
       id: "456", /.../
     }
    ]
  }
}

The cloud function would look like this:

/* on your front end you can manage the add or delete ids from user posts */
export const calculateDashboard = functions.firestore.document(`users_posts/{doc}).onWrite(async(change, _context) {
   const firestore = admin.firestore()
   const dashboardRef = firestore.collection(`users_dashboard`)
   const postRef = firestore.collection(`posts`)

   const user = change.after.data()
   const payload = []
   for (const postId of user.posts_ids) {
      const data = await postRef.doc(postId).get().then((doc) => doc.exists ? doc.data() : null)
      payload.push(data)
   }
   // Maybe you want to exponse only certain props... you can do that here
   return dashboardRef.doc(user.id).set(payload) 
})

The doc max size is 1 MiB (1,048,576 bytes) that is plenty of data you can store in, so you can have like a lot of posts here. Let's talk about costs; I used to think firestore was more like to have several small docs but I've found in practice it works equally well with big size into a big amount of docs.

Now on your dashboard you only need query:

const dashboard = firestore.collection(`users_dashboard`).doc(userID).get()

This a very opinionated way to solve this problem. You could avoid using the users_posts, but maybe you dont want to trigger this process for other than posts related changes.

like image 166
andresmijares Avatar answered Nov 29 '22 12:11

andresmijares