Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB nested sets

Tags:

mongodb

What're the best practices to store nested sets (like trees of comments) in MongoDB?

I mean, every comment can have a parent comment and children-comments (answers).

Storing them like this:

{
   title: "Hello",
   body: "Please comment me!",
   comments: [
        {
            author: "Peter",
            text: "Hi there",
            answers: [
                  {
                      author: "Peter",
                      text: "Hi there",
                      answers: [
                                 { author: "Ivan", text: "Hi there" },
                                 { author: "Nicholas", text: "Hi there" }
                      ]
                  },
                  { author: "Ivan", text: "Hi there" },
                  { author: "Nicholas", text: "Hi there" },
            ]
        },
        { author: "Ivan", text: "Hi there" },
        { author: "Nicholas", text: "Hi there" },
   ]
}

is not cool, because we can't, for example, ask for "all post which are commented by Peter" without map/reduce.

like image 624
Valentin Golev Avatar asked Dec 05 '09 22:12

Valentin Golev


People also ask

What is a nested set?

In a naive set theory, a nested set is a set containing a chain of subsets, forming a hierarchical structure, like Russian dolls.

Where can I find nested documents in MongoDB?

Accessing embedded/nested documents – In MongoDB, you can access the fields of nested/embedded documents of the collection using dot notation and when you are using dot notation, then the field and the nested field must be inside the quotation marks.

How do I create a nested query in MongoDB?

To specify a query condition on fields in an embedded/nested document, use dot notation ( "field. nestedField" ). When querying using dot notation, the field and nested field must be inside quotation marks.


2 Answers

I think there is no perfect solution - depends on what operations are more important for your app. I believe Silicon Alley Insider stores comments nested with MongoDB for example. That does make the query you mention harder.

One option is store at top-level in the post a list of all commenters in an array. Think of that as denormalized data. Then one can easily find all posts which involve a certain commenter. Then to drill down, you use map/reduce or db.eval() to get the nested post info within.

One other note - if you are dealing with a single document, db.eval() is probably lighter-weight than map/reduce. $where is also an option but can be slow so I like the additional 'list of commenters' mentioned above - not it is also easy to index that array too (see 'Multikey' in the docs).

See also: http://groups.google.com/group/mongodb-user/browse_thread/thread/df8250573c91f75a/e880d9c57e343b52?lnk=gst&q=trees#e880d9c57e343b52

like image 170
dm. Avatar answered Oct 05 '22 00:10

dm.


In the link from dm's post Dwight Merriman mentions using a path key and doing regex matches

{
  path : "a.b.c.d.e.f"
}

Another way to do this would be with arrays

{
  path : ["a", "b", "c", "d", "e", "f"]
}

db.test.ensureIndex({path: 1})

that should make it pretty fast.

if each node can only be in a single path then you wouldn't need to do worry about where it is located in the list

db.test.find({path: "a"})

would find all children of "a"

Instead of path names I would probably use the _id of the nodes.

Update

  • one thing to be careful of is that an index can only have one array in it.
  • Be careful to use explain on your queries

    db.test.find({path: {$in: ["a", "b"]})

gives you

 db.test.find({path: {$in: ["a", "b"]}}).explain()
{
        "cursor" : "BtreeCursor path_1 multi",
        "nscanned" : 2,
        "nscannedObjects" : 2,
        "n" : 1,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : true,
        "indexOnly" : false,
        "indexBounds" : {
                "path" : [
                        [
                                "a",
                                "a"
                        ],
                        [
                                "b",
                                "b"
                        ]
                ]
        }
}

but

 db.test.find({path: {$all: ["a", "b"]}}).explain()
{
        "cursor" : "BtreeCursor path_1",
        "nscanned" : 1,
        "nscannedObjects" : 1,
        "n" : 1,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : true,
        "indexOnly" : false,
        "indexBounds" : {
                "path" : [
                        [
                                "a",
                                "a"
                        ]
                ]
        }
}

only uses the first element and then scans all matching results for b.
If a is your root element or is in most of your records then your doing a nearly full scan of the records instead of an efficient index query.

like image 33
Pete Brumm Avatar answered Oct 05 '22 00:10

Pete Brumm