Background I'm prototyping a conversion from our RDBMS database to MongoDB. While denormalizing, it seems as if I have two choices, one which leads to many (millions) of smaller documents or one which leads to fewer (hundreds of thousands) large documents. If I could distill it down to a simple analog, it would be the difference between a collection with fewer Customer documents like this (in Java): <pre class="prettyprint"> class Customer { private String name; private Address address; // each CreditCard has hundreds of Payment instances private Set<CreditCard> creditCards; } </pre> or a collection with many, many Payment documents like this: <pre class="prettyprint"> class Payment { private Customer customer; private CreditCard creditCard; private Date payDate; private float payAmount; } </pre> Question Is MongoDB designed to prefer many, many small documents or fewer large documents? Does the answer mostly depend on what queries I plan on running? (i.e. How many credit cards does customer X have? vs What was the average amount all customers paid last month?) I've looked around a lot but I didn't stumble into any MongoDB schema best practices that would help me answer my question.

You'll definitely need to optimize for the queries you're doing. Here's my best guess based on your description. You'll probably want to know all Credit Cards for each Customer, so keep an array of those within the Customer Object. You'll also probably want to have a Customer reference for each Payment. This will keep the Payment document relatively small. The Payment object will automatically have its own ID and index. You'll probably want to add an index on the Customer reference as well. This will allow you to quickly search for Payments by Customer without storing the whole customer object every time. If you want to answer questions like "What was the average amount all customers paid last month" you're instead going to want a map / reduce for any sizeable dataset. You're not getting this response "real-time". You'll find that storing a "reference" to Customer is probably good enough for these map-reduces. So to answer your question directly: Is MongoDB designed to prefer many, many small documents or fewer large documents? MongoDB is designed to find indexed entries very quickly. MongoDB is very good at finding a few needles in a large haystack. MongoDB is not very good at finding most of the needles in the haystack. So build your data around your most common use cases and write map/reduce jobs for the rarer use cases.

MongoDB Schema Design - Many small documents or fewer large documents?

Tags:

mongodb

schema

database-design

Background
I'm prototyping a conversion from our RDBMS database to MongoDB. While denormalizing, it seems as if I have two choices, one which leads to many (millions) of smaller documents or one which leads to fewer (hundreds of thousands) large documents.

If I could distill it down to a simple analog, it would be the difference between a collection with fewer Customer documents like this (in Java):

 class Customer {     private String name;     private Address address;     // each CreditCard has hundreds of Payment instances     private Set<CreditCard> creditCards; }

or a collection with many, many Payment documents like this:

 class Payment {     private Customer customer;     private CreditCard creditCard;     private Date payDate;     private float payAmount; }

Question
Is MongoDB designed to prefer many, many small documents or fewer large documents? Does the answer mostly depend on what queries I plan on running? (i.e. How many credit cards does customer X have? vs What was the average amount all customers paid last month?)

I've looked around a lot but I didn't stumble into any MongoDB schema best practices that would help me answer my question.

239

asked Jun 14 '10 15:06

Andre

1 Answers

You'll definitely need to optimize for the queries you're doing.

Here's my best guess based on your description.

You'll probably want to know all Credit Cards for each Customer, so keep an array of those within the Customer Object. You'll also probably want to have a Customer reference for each Payment. This will keep the Payment document relatively small.

The Payment object will automatically have its own ID and index. You'll probably want to add an index on the Customer reference as well.

This will allow you to quickly search for Payments by Customer without storing the whole customer object every time.

If you want to answer questions like "What was the average amount all customers paid last month" you're instead going to want a map / reduce for any sizeable dataset. You're not getting this response "real-time". You'll find that storing a "reference" to Customer is probably good enough for these map-reduces.

So to answer your question directly: Is MongoDB designed to prefer many, many small documents or fewer large documents?

MongoDB is designed to find indexed entries very quickly. MongoDB is very good at finding a few needles in a large haystack. MongoDB is not very good at finding most of the needles in the haystack. So build your data around your most common use cases and write map/reduce jobs for the rarer use cases.

199

answered Sep 21 '22 10:09

Gates VP

Related questions
                            
                                Modeling Product Variants
                            
                                How to list all tables in PhpMyAdmin's left menu?
                            
                                Calculating and saving space in PostgreSQL
                            
                                What is the purpose of system table master..spt_values and what are the meanings of its values?
                            
                                When is it better to store flags as a bitmask rather than using an associative table?
                            
                                Standard use of 'Z' instead of NULL to represent missing data?
                            
                                Use float or decimal for accounting application dollar amount?
                            
                                Are nulls in a relational database okay? [closed]
                            
                                Still Confused About Identifying vs. Non-Identifying Relationships
                            
                                SQL Server: the maximum number of rows in table [closed]
                            
                                What's the better database design: more tables or more columns?
                            
                                Database design: Calculating the Account Balance
                            
                                SQL Server: how to constrain a table to contain a single row?
                            
                                How do you like your primary keys? [closed]
                            
                                How to create a new schema/new user in Oracle Database 11g?
                            
                                Why is a database always represented with a cylinder? [closed]
                            
                                Storing Business Hours in a Database
                            
                                What is cardinality in Databases?
                            
                                PostgreSQL Index Usage Analysis
                            
                                Good practices for designing monthly subscription system in database [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With