Recommendations for MongoDB schema design

Tags:

Let's say that you want to model certain situation. Company can have one or more branches. And those branches have employees that can work in different company (or even in two different branches of the same company). This of course is just an example.

Let's also assume that most searches/queries will be done on employees and companies collections.

First (naive) way to do this would be to embed everything (Company has array of Branches and Branches have array of employees):

{
    name: "Company name",
    // other company data
    branches : [
        { 
            name: "Branch name",
            // other branch data
            Employees: [
                {
                    // employee1 data
                },
                {
                    // employee data
                },
            ]
        }
    ]
}

But this would be very inefficient when one would be interested in retrieving employee information (one would have to retrieve company and then iterate over every branch to find employee that is required).

On the other end, one could use references and mimic RDBMS (there would be Company, Branch and Employee collection), but that would mean more queries.

Third option (that I'm closest to), would be to have Employee as a separate collection, and then have an array of references to it in Branches. Also, to allow faster queries like: "employees with certain names, that work for certain company and certain branch", Company ObjectId could be stored in Employee collection:

{
    company_id: "some id",
    first_name: "First name",
    last_name: "Last name",
    //
}

So, in this case, to search for all employees with certain names that work for certain company and certain branch, one would have to do two queries. First query would return companies that satisfy "company condition" (company name and branch name) and then second query on Employee collection would return all Employees that have specified name and that are working in companies whose ids are returned in first query.

Would you do this some other way? Is there some other "recommended" way to do this? Would you add some improvements?

More importantly, what to do in situation when these two queries return result sets that have small intersection? How to improve performance in that case?

968

asked Dec 26 '12 18:12

kevin

1 Answers

I think you are mostly heading in the right direction.

While there are cases where denormalization in MongoDB is not evil like in a relational database but in fact the right thing to do, you have a case here where you should use multiple collections. That's because MongoDB documents have an upper limit of 16MB. When you have a very big company with lots of branches which have lots of employees, and the employee sub-document becomes more convoluted, you could easily crack that limit.

Having a reference from the employee to the company is a good idea. But you should consider using not the _id field of the company, but rather the company name and the branch name, as long as you can guarantee each combination of them to be unique in the company collection (like with an unique compound index on these two fields). The reason is that when you look up an employee, you will usually also want the name of the companies and the branches. When you would only have _id's, you would have to do additional queries to get that information.

You said you don't have a 1:n relation between branches and employees, but rather a n:m relationship. In that case I would recommend you to add an array of "assignments" to each employee, which contains objects with two fields, company_name and company_branch (maybe you would like to add a third field "position" which says what he or she is doing there).

Your employee documents would then look like that:

{
    first_name: "First name",
    last_name: "Last name",
    //
    assignments: [
        { company:"Aperture Science", branch:"R&D", position:"test subject" },
        { company:"Black Mesa", branch:"security", position:"leader of blue shift" }
    ]
}

Note that you can use the strength of schemaless databases here: You could easily have companies which don't just have branches, but even more hierarchy levels (like departments and groups), and others which do not.

But what when I want to rename a company or branch?

In that case you would have to update each employee document which references the renamed company/branch. Yes, it wouldn't be the most efficient schema for that case. But remember that MongoDB schemas should always be optimized for the most common use-cases. What do you think will happen more frequently: a) a company or branch is renamed or b) someone wants to look up an employee?

196

answered Sep 18 '22 22:09

Philipp

Related questions
                            
                                How to find a session from MongoDB collection using Express and MongoStore
                            
                                MongoDB multidimensional array projection
                            
                                MongoDB case insensitive index "starts with" performance problems
                            
                                Cannot xor mongodb field from nodejs without mongoose
                            
                                How does Mongoose poolSize work
                            
                                Is it possible to use MongoDb and PostgreSql for same model in Spring boot?
                            
                                Return embedded documents in query
                            
                                MongoDB, C# and NoRM + Denormalization
                            
                                How to store session values with Node.js and mongodb?
                            
                                How to force a Doctrine MongoDB ODM Document Proxy to convert to the 'original' document?
                            
                                MongoDB geospatial query with $not
                            
                                E11000 duplicate key error index: MongoDb unusual error
                            
                                Does MongoDB's update atomicity apply to both query and modification?
                            
                                Django-nonrel vs Django-mongodb vs Mongokit vs pymongo native
                            
                                Connection pooling in node-mongodb-native, when to call db.open and db.close
                            
                                Mongoose findOne function not being executed
                            
                                Query MongoDB for ordered distinct values
                            
                                Play framework Leon Salat Model Form Mapping
                            
                                How to store metadata describing documents in a MongoDB collection?
                            
                                MapReduce results seem limited to 100?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Recommendations for MongoDB schema design

Tags:

mongodb

nosql

kevin

People also ask

1 Answers

Philipp

Recent Activity

Donate For Us