How to deal with referencing of separately stored objects in document DBs like Mongo?

Tags:

This problem is easily solved in ORMs like Entity Framework or NHibernate, but I do not see any ready-made solution in c# driver for MongoDb. Let's say I have collection of objects of type A referencing objects type B which I need to store in separate collection, so that once specific object B is changed, all A referencing it need to be aware of the change. In other words, I need this object relation being normalized. In the same time I need B being referenced by A inside the class, not by Id, but by type reference like shown below:

public class A
{
   public B RefB { get; set; }
}

Do I have to handle all this referencing consistency on my own? If so, which approaches is the best to use? Do I have to keep both B's Id and B reference in the class and somehow take care of sync their values like that:

public class A
{
    // Need to implement reference consistency as well
    public int RefBId { get; set; }

    private B _refB;
    [BsonIgnore]
    public B RefB
    {
        get { return _refB; }
        set { _refB = value; RefBId = _refB.Id }
    }
}

I know somebody may say relational database meets this case the best, I know, but I really have to use document Db like MongoDb, it solves many problems, and in most cases I need to store objects denormalized for my project, however sometimes we might need mixed design inside single storage.

369

asked Sep 26 '13 16:09

YMC

2 Answers

This is mostly an architectural concern, and it probably depends on personal taste a bit. I'll try to examine the pros and cons (actually only the cons, this is quite opinionated):

On the database level, MongoDB offers no tools to enforce referential integrity, so yes, you have to do this yourself. I suggest you use database objects that look like this:

public class DBObject 
{
    public ObjectId Id {get;set;}
}

public class Department : DBObject 
{
  // ...
}

public class EmployeeDB : DBObject
{
    public ObjectId DepartmentId {get;set;}
}

I suggest to use plain DTOs like this at the database level no matter what. If you want additional sugar, put it in a separate layer even if that means a bit of copying. Logic in the DB objects requires a very good understanding of the way the driver hydrates the object and might require to rely on implementation details.

Now, it's a matter of preference of whether you want to work with more 'intelligent' objects. Indeed, many people like to use strongly-typed auto-activating accessors, e.g.

public class Employee
{
    public Department 
    { get { return /* the department object, magically, from the DB */ } }
}

This pattern comes with a number of challenges:

It requires the Employee class, a model class, to be able to hydrate the object from the database. That is tricky, because it needs to have the DB injected or you need a static object for database access which can also be tricky.
Accessing the Department looks completely cheap, but in fact, it triggers a database operation, it can be slow, it might fail. This is totally hidden from the caller.
In a 1:n relation, things grow a lot more complicated. For instance, would Department also expose a list of Employees? If so, would that really be a list (i.e. once you start reading the first, all employees must be deserialized?) Or is it a lazy MongoCursor?
To make matters worse, it is not usually clear what kind of caching should be used. Let's say you get myDepartment.Employee[0].Department.Name. Obviously, this code isn't smart, but imagine there's a call stack with a few specialized methods. They might invoke the code just like that, even if it's more hidden. Now a naive implementation would actually de-serialize the ref'd Department again. That's ugly. On the other hand, caching aggressively is dangerous because you might actually want to re-fetch the object.
The worst of all: Updates. So far, the challenges were largely read-only. Now lets say I call employeeJohn.Department.Name = 'PixelPushers' and employeeJohn.Save(). Does that update the Department, or not? If it does, are the changes to john serialized first, or after the changes to dependent objects? What about versioning and locking?
Many semantics are hard to implement: employeJohn.Department.Employees.Clear() can be tricky.

Many ORMs use a set of complex patterns to allow these operations, so these problems aren't impossible to work around. But ORMs are typically in the range of 100k to well over 1M lines of code(!), and I doubt you have that kind of time. In a RDBMS, the need to activate related objects and use sth. like an ORM is much more severe, because you can't embed e.g. the list of line items in an invoice, so every 1:n or m:n relation must be represented using a join. That's called the object-relation mismatch.

The idea of document databases, as I understand it, is that you don't need to break your model apart as unnaturally as you have to in a RDBMS. Still, there are the 'object borders'. If you think of your data model as a network of connected nodes, the challenge is to know on which part of the data you are currently working.

Personally, I prefer to not put an abstraction layer on top of this, because that abstraction is leaky, it hides what is really going on from the caller, and it tries to solve every problem with the same hammer.

Part of the idea of NoSQL is that your query patterns must be carefully matched to the data-model, because you can't simply apply the JOIN hammer to any table in sight.

So, my opinion is: stick to a thin layer and perform most of the database operation in a service layer. Move DTOs around instead of designing a complex domain model that breaks apart as soon as you need to add locking, mvcc, cascaded updates, etc.

112

answered Sep 21 '22 00:09

mnemosyn

In a document database, when you do something like your first example:

public class A
{
   public B RefB { get; set; }
}

You are fully embedding the value of B into the RefB property. In other words, your document looks like this:

[a/1]
{
    AProp: "foo",
    RefB: {
        BProp: "bar"
    }
}

It helps to look at things from a Domain Driven Design (DDD) perspective. This pattern of embedding normally occurs when B is either a "value object" or a "non-aggregate entity" (using DDD terminology).

It can also occur if you are storing a point-in-time snapshot of some other aggregate entity. In that scenario, you don't want to update the values of B if they change, or it would no longer represent that point in time.

The other pattern would be to treat A and B as separate aggregates. If one needs to refer to the other, you specify that with a reference to its ID only.

public class A
{
   public string BId { get; set; }
}

Your documents would then be stored such as:

[a/1]
{
    AProp: "foo",
    BId: "b/2"
}

[b/2]
{
    BProp: "bar",
}

Note: I believe in MongoDB, you would use an ObjectId type. In RavenDB, you would usually use a string, but an int is possible with a bit of minor adjustment. Other document databases may allow other types.

The part that doesn't work well in document databases is how you showed in your second example A keeping a reference to B without keeping it as part of the document. This pattern may work in ORMs like Entity Framework or NHibernate, but it tends to be implemented via virtual properties and proxy classes. Those don't hold up well in a document database environment.

So if they are separate documents, instead of loading A and using a.RefB to get to B, you would just load A and B individually. For example, you might load A, and the use the BId to determine how to load B.

Of course, the question still comes down to whether to embed or to link. That is something you will have to figure out, as it can often be done either way. Typically one way works better than the other for a particular domain concern. But you typically don't do both.

answered Sep 19 '22 00:09

Matt Johnson-Pint

Related questions
                            
                                LINQ: Sequence contains no elements error
                            
                                Wrong conversion of Moscow time to UTC
                            
                                linq exception : This function can only be invoked from LINQ to Entities
                            
                                Syntax error in INSERT INTO statement using OleDb
                            
                                Unit testing: TDD with POCO Objects with navigation properties (relationship fixup)
                            
                                Executed piped commands via System.Diagnostics.Process on Mono
                            
                                TDD nUnit multiple asserts for one method
                            
                                Problems deserializing List of objects
                            
                                Exporting a PDF-file with ASP.NET MVC
                            
                                Can async-await be available in other .NET languages besides C#?
                            
                                How to handle a NullReference Exception c#
                            
                                Convert Date String to another Date string with different format
                            
                                Slow search for items using extended property on Exchange
                            
                                Convert string to double: input string was not in a correct format
                            
                                How to add event for Checkbox click in Asp.net Gridview Column
                            
                                How to bind a button on wpf grid to a method on MVVM when I am using caliburn micro
                            
                                How to write unit test first and code later?
                            
                                difference between HttpContext and HttpRequest?
                            
                                Add a query string(s) at end of url using routes in ASP.NET MVC3
                            
                                WCF User Authentication & Authorization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to deal with referencing of separately stored objects in document DBs like Mongo?

Tags:

c#

mongodb

YMC

People also ask

2 Answers

mnemosyn

Matt Johnson-Pint

Recent Activity

Donate For Us