I have two collections: <ol> <li>Users</li> <li>Uploads</li> </ol> Each upload has a <code>User</code> associated with it and I need to know their details when an <code>Upload</code> is viewed. Is it best practice to duplicate this data inside the the Uploads record, or use populate() to pull in these details from the Users collection referenced by <code>_id</code>? OPTION 1 <pre class="prettyprint"><code>var UploadSchema = new Schema({ _id: { type: Schema.ObjectId }, _user: { type: Schema.ObjectId, ref: 'users'}, title: { type: String }, }); </code></pre> OPTION 2 <pre class="prettyprint"><code>var UploadSchema = new Schema({ _id: { type: Schema.ObjectId }, user: { name: { type: String }, email: { type: String }, avatar: { type: String }, //...etc }, title: { type: String }, }); </code></pre> With 'Option 2' if any of the data in the <code>Users</code> collection changes I will have to update this across all associated <code>Upload</code> records. With 'Option 1' on the other hand I can just chill out and let <code>populate()</code> ensure the latest User data is always shown. Is the overhead of using <code>populate()</code> significant? What is the best practice in this common scenario?

If You need to query on your Users, keep users alone. If You need to query on your uploads, keep uploads alone. Another question you should ask yourself is: Every time i need this data, do I need the embedded objects (and vice-versa)? How many time this data will be updated? How many times this data will be read? Think about a friendship request: Each time you need the request you need the user which made the request, then embed the request inside the user document. You will be able to create an index on the embedded object too, and your search will be mono query / fast / consistent. <hr> Just a link to my previous reply on a similar question: Mongo DB relations between objects I think this post will be right for you http://www.mongodb.org/display/DOCS/Schema+Design Use Cases Customer / Order / Order Line-Item <blockquote> Orders should be a collection. customers a collection. line-items should be an array of line-items embedded in the order object. </blockquote> Blogging system. <blockquote> Posts should be a collection. post author might be a separate collection, or simply a field within posts if only an email address. comments should be embedded objects within a post for performance. </blockquote> Schema Design Basics Kyle Banker, 10gen <blockquote> http://www.10gen.com/presentation/mongosf2011/schemabasics </blockquote> Indexing & Query Optimization Alvin Richards, Senior Director of Enterprise Engineering <blockquote> http://www.10gen.com/presentation/mongosf-2011/mongodb-indexing-query-optimization </blockquote> **These 2 videos are the bests on mongoddb ever seen imho*

Mongoose: populate() / DBref or data duplication?

Tags:

node.js

mongodb

nosql

mongoose

I have two collections:

Users
Uploads

Each upload has a User associated with it and I need to know their details when an Upload is viewed. Is it best practice to duplicate this data inside the the Uploads record, or use populate() to pull in these details from the Users collection referenced by _id?

OPTION 1

var UploadSchema = new Schema({
    _id: { type: Schema.ObjectId },
    _user: { type: Schema.ObjectId, ref: 'users'},
    title: { type: String },
});

OPTION 2

var UploadSchema = new Schema({
    _id: { type: Schema.ObjectId },
    user: { 
           name: { type: String },
           email: { type: String },
           avatar: { type: String },
           //...etc
          },
    title: { type: String },
});

With 'Option 2' if any of the data in the Users collection changes I will have to update this across all associated Upload records. With 'Option 1' on the other hand I can just chill out and let populate() ensure the latest User data is always shown.

Is the overhead of using populate() significant? What is the best practice in this common scenario?

245

asked Nov 01 '11 17:11

wilsonpage

1 Answers

If You need to query on your Users, keep users alone. If You need to query on your uploads, keep uploads alone.

Another question you should ask yourself is: Every time i need this data, do I need the embedded objects (and vice-versa)? How many time this data will be updated? How many times this data will be read?

Think about a friendship request: Each time you need the request you need the user which made the request, then embed the request inside the user document.

You will be able to create an index on the embedded object too, and your search will be mono query / fast / consistent.

Just a link to my previous reply on a similar question: Mongo DB relations between objects

I think this post will be right for you http://www.mongodb.org/display/DOCS/Schema+Design

Use Cases

Customer / Order / Order Line-Item

Orders should be a collection. customers a collection. line-items should be an array of line-items embedded in the order object.

Blogging system.

Posts should be a collection. post author might be a separate collection, or simply a field within posts if only an email address. comments should be embedded objects within a post for performance.

Schema Design Basics

Kyle Banker, 10gen

http://www.10gen.com/presentation/mongosf2011/schemabasics

Indexing & Query Optimization Alvin Richards, Senior Director of Enterprise Engineering

http://www.10gen.com/presentation/mongosf-2011/mongodb-indexing-query-optimization

**These 2 videos are the bests on mongoddb ever seen imho*

192

answered Oct 20 '22 22:10

kilianc

Related questions
                            
                                Why IndexedDB is not available in node.js? [closed]
                            
                                Mocha.js: to run "after" hook even if test suit fails
                            
                                How do I display flash message without page refresh using express and connect-flash?
                            
                                node.js cannot find a module in the same folder
                            
                                From node.js, which is faster, shell grep or fs.readFile?
                            
                                How to have npm run <script> delegate to child package.json?
                            
                                Error publishing a new version of a package in npm
                            
                                How to properly do a Bulk upsert/update in MongoDB
                            
                                Error: Cannot find module 'npm-registry-client'
                            
                                Creating different eslint rules for local development
                            
                                How to get a unique PC ID via Electron?
                            
                                Passport-jwt token expiration
                            
                                How do I include a virtual field in API response using Mongoose?
                            
                                Node.js: how can I return a value from an event listener?
                            
                                How do I call an asynchronous node.js function from within a GraphQL resolver requiring a return statement?
                            
                                Is there a way to send the verification email with the Firebase Admin SDK from my Node.js server?
                            
                                How can I install TypeScript declarations for scoped/namespaced packages via @types?
                            
                                Place image over an other image using Jimp
                            
                                Why is LIBUV needed in Node JS?
                            
                                How to do a simple read POST data in Node JS?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With