Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Managing Denormalized/Duplicated Data in Cloud Firestore

If you have decided to denormalize/duplicate your data in Firestore to optimize for reads, what patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?

As an example, if I have a feature like a Pinterest Board where any user on the platform can pin my post to their own board, how would you go about keeping track of the duplicated data in many locations?

What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.

For example, creating a users_posts_boards collection that is firstly a collection of userIDs with a sub-collection of postIDs that finally has another sub-collection of boardIDs with a boardOwnerID. Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?

Also if posts can additionally be shared to groups and lists would you continue to make users_posts_groups and users_posts_lists collections and sub-collections to track duplicated data in the same way?

Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?

{
  postID: 'someID',
  locations: ( <---- collection
    "path/to/post/location1",
    "path/to/post/location2",
    ...
  )
}

This would mean that you would basically need to have all writes to Firestore done through Cloud Functions that can keep a track of this data for security reasons....unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.

I'm basically looking for a sane way to track heavily denormalized data.

Edit: oh yeah, another great example would be the post author's profile information being embedded in every post. Imagine the hellscape trying to keep all that up-to-date as it is shared across a platform and then a user updates their profile.

like image 733
Socceroos Avatar asked Jan 18 '19 13:01

Socceroos


1 Answers

I'm aswering this question because of your request from here.

When you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an object, you need to do it in every place that it exists.

What patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?

To keep track of all operations that we need to do in order to have consistent data, we add all operations to a batch. You can add one or more update operations on different references, as well as delete or add operations. For that please see:

  • How to do a bulk update in Firestore

What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.

In my opinion there is no need to add an extra "relational-like table" but if you feel confortable with it, go ahead and use it.

Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?

Yes, you need to pass to each document() method, the corresponding document id in order to make the update operation work. Unfortunately, there are no wildcards in Cloud Firestore paths to documents. You have to identify the documents by their ids.

Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?

I consider that isn't also necessary since it require extra read operations. Since everything in Firestore is about the number of read and writes, I think you should think again about this approach. Please see Firestore usage and limits.

unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.

Firestore security rules are so powerful to do that. You can also allow to read or write or even apply security rules regarding each CRUD operation you need.

I'm basically looking for a sane way to track heavily denormalized data.

The simplest way I can think of, is to add the operation in a datastructure of type key and value. Let's assume we have a map that looks like this:

Map<Object, DocumentRefence> map = new HashMap<>();
map.put(customObject1, reference1);
map.put(customObject2, reference2);
map.put(customObject3, reference3);
//And so on

Iterate throught the map, and add all those keys and values to batch, commit the batch and that's it.

like image 173
Alex Mamo Avatar answered Nov 13 '22 12:11

Alex Mamo