Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I avoid duplicating data in a document database like RavenDB?

Given that document databases, such as RavenDB, are non-relational, how do you avoid duplicating data that multiple documents have in common? How do you maintain that data if it's okay to duplicate it?

like image 949
John Nelson Avatar asked Jun 03 '10 17:06

John Nelson


2 Answers

With a document database you have to duplicate your data to some degree. What that degree is will depend on your system and use cases.

For example if we have a simple blog and user aggregates we could set them up as:

  public class User 
  {
    public string Id { get; set; }
    public string Name  { get; set; }
    public string Username  { get; set; }
    public string Password  { get; set; }
  }

  public class Blog
  {
     public string Id  { get; set; }
     public string Title  { get; set; }

     public class BlogUser
     {
       public string Id  { get; set; }
       public string Name  { get; set; }
     }
  }

In this example I have nested a BlogUser class inside the Blog class with the Id and Name properties of the User Aggregate associated with the Blog. I have included these fields as they are the only fields the Blog class is interested in, it doesn't need to know the users username or password when the blog is being displayed.

These nested classes are going to dependant on your systems use cases, so you have to design them carefully, but the general idea is to try and design Aggregates which can be loaded from the database with a single read and they will contain all the data required to display or manipulate them.

This then leads to the question of what happens when the User.Name gets updated.

With most document databases you would have to load all the instances of Blog which belong to the updated User and update the Blog.BlogUser.Name field and save them all back to the database.

Raven is slightly different as it support set functions for updates, so you are able to run a single update against RavenDB which will up date the BlogUser.Name property of the users blogs without you have to load them and update them all individually.

The code for doing the update within RavenDB (the manual way) for all the blog's would be:

  public void UpdateBlogUser(User user)
  {
    var blogs = session.Query<Blog>("blogsByUserId")
                  .Where(b.BlogUser.Id == user.Id)
                  .ToList();

    foreach(var blog in blogs)
       blog.BlogUser.Name == user.Name;

    session.SaveChanges()
  }

I've added in the SaveChanges just as an example. The RavenDB Client uses the Unit of Work pattern and so this should really happen somewhere outside of this method.

like image 118
Andy Long Avatar answered Sep 22 '22 15:09

Andy Long


There's no one "right" answer to your question IMHO. It truly depends on how mutable the data you're duplicating is.

Take a look at the RavenDB documentation for lots of answers about document DB design vs. relational, but specifically check out the "Associations Management" section of the Document Structure Design Considerations document. In short, document DBs use the concepts of reference by IDs when they don't want to embed shared data in a document. These IDs are not like FKs, they are entirely up to the application to ensure the integrity of and resolve.

like image 24
Drew Marsh Avatar answered Sep 23 '22 15:09

Drew Marsh