What're the best practices to store nested sets (like trees of comments) in MongoDB?
I mean, every comment can have a parent comment and children-comments (answers).
Storing them like this:
{
title: "Hello",
body: "Please comment me!",
comments: [
{
author: "Peter",
text: "Hi there",
answers: [
{
author: "Peter",
text: "Hi there",
answers: [
{ author: "Ivan", text: "Hi there" },
{ author: "Nicholas", text: "Hi there" }
]
},
{ author: "Ivan", text: "Hi there" },
{ author: "Nicholas", text: "Hi there" },
]
},
{ author: "Ivan", text: "Hi there" },
{ author: "Nicholas", text: "Hi there" },
]
}
is not cool, because we can't, for example, ask for "all post which are commented by Peter" without map/reduce.
In a naive set theory, a nested set is a set containing a chain of subsets, forming a hierarchical structure, like Russian dolls.
Accessing embedded/nested documents – In MongoDB, you can access the fields of nested/embedded documents of the collection using dot notation and when you are using dot notation, then the field and the nested field must be inside the quotation marks.
To specify a query condition on fields in an embedded/nested document, use dot notation ( "field. nestedField" ). When querying using dot notation, the field and nested field must be inside quotation marks.
I think there is no perfect solution - depends on what operations are more important for your app. I believe Silicon Alley Insider stores comments nested with MongoDB for example. That does make the query you mention harder.
One option is store at top-level in the post a list of all commenters in an array. Think of that as denormalized data. Then one can easily find all posts which involve a certain commenter. Then to drill down, you use map/reduce or db.eval() to get the nested post info within.
One other note - if you are dealing with a single document, db.eval() is probably lighter-weight than map/reduce. $where is also an option but can be slow so I like the additional 'list of commenters' mentioned above - not it is also easy to index that array too (see 'Multikey' in the docs).
See also: http://groups.google.com/group/mongodb-user/browse_thread/thread/df8250573c91f75a/e880d9c57e343b52?lnk=gst&q=trees#e880d9c57e343b52
In the link from dm's post Dwight Merriman mentions using a path key and doing regex matches
{
path : "a.b.c.d.e.f"
}
Another way to do this would be with arrays
{
path : ["a", "b", "c", "d", "e", "f"]
}
db.test.ensureIndex({path: 1})
that should make it pretty fast.
if each node can only be in a single path then you wouldn't need to do worry about where it is located in the list
db.test.find({path: "a"})
would find all children of "a"
Instead of path names I would probably use the _id of the nodes.
Update
Be careful to use explain on your queries
db.test.find({path: {$in: ["a", "b"]})
gives you
db.test.find({path: {$in: ["a", "b"]}}).explain()
{
"cursor" : "BtreeCursor path_1 multi",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"path" : [
[
"a",
"a"
],
[
"b",
"b"
]
]
}
}
but
db.test.find({path: {$all: ["a", "b"]}}).explain()
{
"cursor" : "BtreeCursor path_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"path" : [
[
"a",
"a"
]
]
}
}
only uses the first element and then scans all matching results for b.
If a is your root element or is in most of your records then your doing a nearly full scan of the records instead of an efficient index query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With