Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recommendations for MongoDB schema design

Tags:

mongodb

nosql

Let's say that you want to model certain situation. Company can have one or more branches. And those branches have employees that can work in different company (or even in two different branches of the same company). This of course is just an example.

Let's also assume that most searches/queries will be done on employees and companies collections.

First (naive) way to do this would be to embed everything (Company has array of Branches and Branches have array of employees):

{
    name: "Company name",
    // other company data
    branches : [
        { 
            name: "Branch name",
            // other branch data
            Employees: [
                {
                    // employee1 data
                },
                {
                    // employee data
                },
            ]
        }
    ]
}

But this would be very inefficient when one would be interested in retrieving employee information (one would have to retrieve company and then iterate over every branch to find employee that is required).

On the other end, one could use references and mimic RDBMS (there would be Company, Branch and Employee collection), but that would mean more queries.

Third option (that I'm closest to), would be to have Employee as a separate collection, and then have an array of references to it in Branches. Also, to allow faster queries like: "employees with certain names, that work for certain company and certain branch", Company ObjectId could be stored in Employee collection:

{
    company_id: "some id",
    first_name: "First name",
    last_name: "Last name",
    //
}

So, in this case, to search for all employees with certain names that work for certain company and certain branch, one would have to do two queries. First query would return companies that satisfy "company condition" (company name and branch name) and then second query on Employee collection would return all Employees that have specified name and that are working in companies whose ids are returned in first query.

Would you do this some other way? Is there some other "recommended" way to do this? Would you add some improvements?

More importantly, what to do in situation when these two queries return result sets that have small intersection? How to improve performance in that case?

like image 968
kevin Avatar asked Dec 26 '12 18:12

kevin


People also ask

What the most important consideration while designing the schema for MongoDB?

When doing schema design in MongoDB there is more to consider than a blanket model for a “One-to-N” relationship model. We need to consider the size of “N” for our modeling because in this instance, size matters. One-to-one relationships can easily be handled with embedding a document inside another document.

Which of the following things while designing the schema in MongoDB?

Considerations while designing Schema in MongoDB Combine objects into one document if you will use them together. Otherwise separate them (but make sure there should not be need of joins). Duplicate the data (but limited) because disk space is cheap as compare to compute time. Do joins while write, not on read.

Which of the following is the most important consideration while designing the schema for NoSQL?

The key here is to strike a balance between designing a container for each query vs designing one container to satisfy multiple queries. Hence, Denormalization is of prime importance while designing NoSQL.


1 Answers

I think you are mostly heading in the right direction.

While there are cases where denormalization in MongoDB is not evil like in a relational database but in fact the right thing to do, you have a case here where you should use multiple collections. That's because MongoDB documents have an upper limit of 16MB. When you have a very big company with lots of branches which have lots of employees, and the employee sub-document becomes more convoluted, you could easily crack that limit.

Having a reference from the employee to the company is a good idea. But you should consider using not the _id field of the company, but rather the company name and the branch name, as long as you can guarantee each combination of them to be unique in the company collection (like with an unique compound index on these two fields). The reason is that when you look up an employee, you will usually also want the name of the companies and the branches. When you would only have _id's, you would have to do additional queries to get that information.

You said you don't have a 1:n relation between branches and employees, but rather a n:m relationship. In that case I would recommend you to add an array of "assignments" to each employee, which contains objects with two fields, company_name and company_branch (maybe you would like to add a third field "position" which says what he or she is doing there).

Your employee documents would then look like that:

{
    first_name: "First name",
    last_name: "Last name",
    //
    assignments: [
        { company:"Aperture Science", branch:"R&D", position:"test subject" },
        { company:"Black Mesa", branch:"security", position:"leader of blue shift" }
    ]
}

Note that you can use the strength of schemaless databases here: You could easily have companies which don't just have branches, but even more hierarchy levels (like departments and groups), and others which do not.

But what when I want to rename a company or branch?

In that case you would have to update each employee document which references the renamed company/branch. Yes, it wouldn't be the most efficient schema for that case. But remember that MongoDB schemas should always be optimized for the most common use-cases. What do you think will happen more frequently: a) a company or branch is renamed or b) someone wants to look up an employee?

like image 196
Philipp Avatar answered Sep 18 '22 22:09

Philipp