I believe there at least two ways to have embedded data in a mongodb document. In a simplified case we could have something like this: <pre class="prettyprint"><code>{ 'name' : 'bill', 'lines': { 'idk73716': {'name': 'Line A'}, 'idk51232': {'name': 'Line B'}, 'idk23321': {'name': 'Line C'} } } </code></pre> and as an array: <pre class="prettyprint"><code>{ 'name' : 'bill', 'lines': [ {'id': 'idk73716', 'name': 'Line A'}, {'id': 'idk51232', 'name': 'Line B'}, {'id': 'idk23321', 'name': 'Line C'} ] } </code></pre> As you can see in this use case it's important to keep the id of each line. I'm wondering if there are pros and cons between these two schemas. Especially when it comes to using indexes I have the feeling that the second may be easier to work with as one could create an index on 'lines.id' or even 'lines.name' to search for an id or name accross all documents. I didn't find any working solution to index the ids ('idk73716' and so on) in the first example. Is it generally preferred to use the second approach if you have a use case like this?

Today we have $eleMatch operator to achieve this, as discussed here - Retrieve only the queried element in an object array in MongoDB collection But this question poses some interesting design choices, which I am also struggling to make today. What should be the preferred choice from given two options if frequent CRUD is required in embedded documents? I found, it is easy to perform CRUD with new $set/$unset operators, on embedded documents, when ID s used as property names. And if client can get hold of ID to make edits, it is better than array, IMO. Here is another useful blogpost by Mongodb about schema design and making these design decisions http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1

MongoDB preferred schema for embedded collections. documents vs. arrays

Tags:

I believe there at least two ways to have embedded data in a mongodb document. In a simplified case we could have something like this:

{
    'name' : 'bill',
    'lines': {
       'idk73716': {'name': 'Line A'},
       'idk51232': {'name': 'Line B'},
       'idk23321': {'name': 'Line C'}
    }
}

and as an array:

{
    'name' : 'bill',
    'lines': [
       {'id': 'idk73716', 'name': 'Line A'},
       {'id': 'idk51232', 'name': 'Line B'},
       {'id': 'idk23321', 'name': 'Line C'}
    ]
}

As you can see in this use case it's important to keep the id of each line.

I'm wondering if there are pros and cons between these two schemas. Especially when it comes to using indexes I have the feeling that the second may be easier to work with as one could create an index on 'lines.id' or even 'lines.name' to search for an id or name accross all documents. I didn't find any working solution to index the ids ('idk73716' and so on) in the first example.

Is it generally preferred to use the second approach if you have a use case like this?

293

asked Nov 10 '11 09:11

antons

2 Answers

In your first approach you can't index the id fields, since id used as key. Its kind of act like key value dictionary. This approach is useful if you have the known set of ids (of course less number).Assume In your first example the id is well known at front ,

>>db.your_colleection.find()
 { "_id" : ObjectId("4ebbb6f974235464de49c3a5"), "name" : "bill", 
  "lines" : { 
             "idk73716" : { "name" : "Line A" },
             "idk51232" : { "name" : "Line B" } ,
             "idk23321":  { "name" : "Line C" }
            } 
  }

so to find the values for id field idk73716, you can do this by

 db.your_colleection.find({},{'lines.idk73716':1})
 { "_id" : ObjectId("4ebbb6f974235464de49c3a5"), "lines" : { "idk73716" : { "name" : "Line A" } } }

the empty {} denotes the query, and the second part {'lines.idk73716':1} is a query selector.

having ids as keys having an advantage of picking the particular field alone. Even though {'lines.idk73716':1} is a field selector, here it serves as a query and selector. but this cannot be done in your second approach. Assume the second collection is kind of like this

> db.second_collection.find()
{ "_id" : ObjectId("4ebbb9c174235464de49c3a6"), "name" : "bill", "lines" : [
    {
        "id" : "idk73716",
        "name" : "Line A"
    },
    {
        "id" : "idk51232",
        "name" : "Line B"
    },
    {
        "id" : "idk23321",
        "name" : "Line C"
    }
] }
>

And you indexed the field id, so if you want to query by id

> db.second_collection.find({'lines.id' : 'idk73716' })

{ "_id" : ObjectId("4ebbb9c174235464de49c3a6"), "name" : "bill", "lines" : [
    {
        "id" : "idk73716",
        "name" : "Line A"
    },
    {
        "id" : "idk51232",
        "name" : "Line B"
    },
    {
        "id" : "idk23321",
        "name" : "Line C"
    }
] }
>

by seeing the above output, its visible that there is no way to pick the matching sub(embedded) documents alone, but it is possible in the the first approach. This is the default behavior of mongodb.

see

db.second_collection.find({'lines.id' : 'idk73716' },{'lines':1})

will fetch all lines, not just idk73716

{ "_id" : ObjectId("4ebbb9c174235464de49c3a6"), "lines" : [
    {
        "id" : "idk73716",
        "name" : "Line A"
    },
    {
        "id" : "idk51232",
        "name" : "Line B"
    },
    {
        "id" : "idk23321",
        "name" : "Line C"
    }
] }

Hope this helps

EDIT

Thanks to @Gates VP for pointing out

db.your_collection.find({'lines.idk73716':{$exists:true}}). If you want to use the "ids as keys" version, the exists query will work, but it will not be indexable

We still can use $exists to query the id, but it will not be indexable

151

answered Sep 26 '22 03:09

RameshVel

Today we have $eleMatch operator to achieve this, as discussed here - Retrieve only the queried element in an object array in MongoDB collection

But this question poses some interesting design choices, which I am also struggling to make today. What should be the preferred choice from given two options if frequent CRUD is required in embedded documents?

I found, it is easy to perform CRUD with new $set/$unset operators, on embedded documents, when ID s used as property names. And if client can get hold of ID to make edits, it is better than array, IMO. Here is another useful blogpost by Mongodb about schema design and making these design decisions

http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1

answered Sep 27 '22 03:09

Anand

Related questions
                            
                                Gzipped JSON file not decompressing
                            
                                Choosing between CharSequence and String for an API [duplicate]
                            
                                C++/CLI delegate as function pointer (System.AccessViolationException)
                            
                                eclipse import project using command line
                            
                                "computeValuesWithHarfbuzz -- need to force to single run" in Android 4: What does this mean?
                            
                                matplotlib: extended line over 2 control points [duplicate]
                            
                                Google Go and SQLite: What library to use and how? [closed]
                            
                                Can I update New in before insert trigger in sqlite?
                            
                                file_get_contents - failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
                            
                                List all issues from an organization's private repo using github api v3
                            
                                MVP Communication between presenters?
                            
                                Gapless Transition from Video to Video using html5

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With