Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle changes in duplicated data in NoSQL

We're evaluating NoSQL for an upcoming project. I tend to think of things in a RDBMS way and am having trouble conceptualizing the lack of normalization.

I understand that duplicating data is not considered wrong in NoSQL. What I'm having trouble understanding is fixing changes to data to prevent anomalies.

Explanation of Question by Example:

You are organizing a series of poker tournaments. You have players, locations, and tournament events. As I understand it, a tournament event might contain a location and a collection of players. It does not need to have all the player data, but if you want to get the names and home addresses of everyone going to the next tournament, that info should be in the tournament collection.

Someone has gotten married and moved, changing their last name and address. Does the application need to update the player collection and the tournament collection? Or is my model of the collections wrong? How do developers "keep track" of where information is duplicated?

like image 698
justkevin Avatar asked Mar 02 '12 16:03

justkevin


1 Answers

The model that I see being used quite a bit lately is to have an immutable "master" collection of data (in your case, the list of players, the list of tournaments with the players in each tournament modeled "relationally", where the tournament record has a list of player ids), and a denormalized list (in your case, a list of tournaments with the fully-populated player data) that is only ever updated by running a periodic process over the "master" data.

This way the application only needs to update the master data, and the periodic update process will eventually rebuild the denormalized result.

like image 176
Chris Shain Avatar answered Oct 20 '22 12:10

Chris Shain