Say I have two collections, each with value independent of each other, but each related to one another. They are photos
and users
. There is a one-to-many relationship between users and photos.
An example of denormalized data:
users:
{
"id": "AABC",
"name": "Donna Smith"
}
photos:
{
"id": "FAD4",
"description": "cute dog",
"user_id": "AABC", // This is the relationship
"user_name": "Donna Smith" // This is the denormalized value from the "users" collection
}
How can I ensure consistency with documents in the photos
collection when user "AABC" changes name from "Donna Smith" to "Donna Chang"?
Being non-transactional, I understand the consistency is going to be eventual.
A simple (naive) implementation might trigger a background job after the change to user "AABC" to update all photos where user_id = "AABC". And in the case of a single update, that would work well. But this is a multi-user environment, and there's going to be updates flying in all directions concurrently. What if, for example, half-way through the background update of photos to change "Donna Smith" to "Donna Chang", the name of user "AABC" is changed back to "Donna Smith"?
Searching online, I see a lot of discussion of how to model denormalized data. But any discussion about how to maintain it seems to be trivialised as "you'll also need to update all related records". Are there any NoSQL systems that do the heavy lifting for you in this scenario? Any frameworks or utilities?
I've read Thomas Wanschik's excellent blog articles on the topic of "materialized views" and background updates for exactly this scenario. But I'm left concerned that:
My early understanding of NoSQL is there is a true analysis of cost when delivering huge amounts of data back to the user/application.
When delivering back your photos in your application, what is more likely to happen more frequent? Delivery of the photos back to the user and perhaps their friends which are viewing them...or the changing of the user's name?
Since the changing of the user's name is a less common instance in the application, NoSQL's Denormalization claim to fame is that you can deliver hi-speed gobs of photo data back to the users without the expense of JOINs in a traditional normalized/RDBMS environment.
Using a few tools that are out there these days (since you wrote this a fairly long time ago) can assist with situations like this, but you were essentially correct in that you can schedule a code change to handle this...it will be slow...it will be expensive....but it will work...and you'll still have the benefits of the speed of delivering your photos to the application, which essentially is the main purpose of your app.
This question grows into an epic novel which has SQL Defenders on one side and the "rabble" NoSQL followers on the other. Traditional DBA's shudder at the thought of compromising structure for speed, but think of NoSQL as the old "Super Table" concept of long ago where we used to think in terms of what would be returned vs. what needs to be stored. Essentially...this is what gave rise to the NoSQL concept and it is proving to be very helpful in large scale applications and big data reporting.
I know this is an old question, but I still hope my answer helps others such as myself demystify the NoSQL benefits when it comes to this type of question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With