Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve querying fields exist in MongoDB

I'm in progress with estimation of MongoDB for our customers. Per requirements we need associate with some entity ent variable set of name-value pairs.

db.ent.insert({'a':5775, 'b':'b1'})
db.ent.insert({'c':'its a c', 'b':'b2'})
db.ent.insert({'a':7557, 'c':'its a c'})

After this I need intensively query ent for presence of fields:

db.ent.find({'a':{$exists:true}})
db.ent.find({'c':{$exists:false}})

Per MongoDB docs:

$exists is not very efficient even with an index, and esp. with {$exists:true} since it will effectively have to scan all indexed values.

Can experts there provide more efficient way (even with shift the paradigm) to deal fast with vary name-value pairs

like image 954
Dewfy Avatar asked Jan 25 '12 20:01

Dewfy


3 Answers

You can redesign your schema like this:

{
  pairs:[
  {k: "a", v: 5775},
  {k: "b", v: "b1"},
  ]
}

Then you indexing your key:

db.people.ensureIndex({"pairs.k" : 1})

After this you will able to search by exact match:

db.ent.find({'pairs.k':"a"})

In case you go with Sparse index and your current schema, proposed by @WesFreeman, you will need to create an index on each key you want to search. It can affect write performance or will be not acceptable if your keys are not static.

like image 157
Andrew Orsich Avatar answered Nov 17 '22 06:11

Andrew Orsich


Simply redesign your schema such that it's an indexable query. Your use case is infact analogous to the first example application given in MongoDB The Definitive Guide.

If you want/need the convenience of result.a just store the keys somewhere indexable.

instead of the existing:

db.ent.insert({a:5775, b:'b1'})

do

db.ent.insert({a:5775, b:'b1', index: ['a', 'b']})

That's then an indexable query:

db.end.find({index: "a"}).explain()
{
    "cursor" : "BtreeCursor index_1",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "index" : [
            [
                "a",
                "a"
            ]
        ]
    }
}

or if you're ever likely to query also by value:

db.ent.insert({
    a:5775, 
    b:'b1', 
    index: [
        {name: 'a', value: 5775}, 
        {name: 'b', value: 'b1'}
    ]
})

That's also an indexable query:

db.end.find({"index.name": "a"}).explain()
{
    "cursor" : "BtreeCursor index.name_",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "index.name" : [
            [
                "a",
                "a"
            ]
        ]
    }
}
like image 41
AD7six Avatar answered Nov 17 '22 06:11

AD7six


I think a sparse index is the answer to this, although you'll need an index for each field. http://www.mongodb.org/display/DOCS/Indexes#Indexes-SparseIndexes

Sparse indexes should help with $exists:true queries.

Even still, if your field is not really sparse (meaning it's mostly set), it's not going to help you that much.

Update I guess I'm wrong. Looks like there's an open issue ( https://jira.mongodb.org/browse/SERVER-4187 ) still that $exists doesn't use sparse indexes. However, you can do something like this with find and sort, which looks like it properly uses the sparse index:

db.ent.find({}).sort({a:1});

Here's a full demonstration of the difference, using your example values:

> db.ent.insert({'a':5775, 'b':'b1'})
> db.ent.insert({'c':'its a c', 'b':'b2'})
> db.ent.insert({'a':7557, 'c':'its a c'})
> db.ent.ensureIndex({a:1},{sparse:true});

Note that find({}).sort({a:1}) uses the index (BtreeCursor):

> db.ent.find({}).sort({a:1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
    "a" : [
        [
            {
                "$minElement" : 1
            },
            {
                "$maxElement" : 1
            }
        ]
    ]
}
}

And find({a:{$exists:true}}) does a full scan:

> db.ent.find({a:{$exists:true}}).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

Looks like you can also use .hint({a:1}) to force it to use the index.

> db.ent.find().hint({a:1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
    "a" : [
        [
            {
                "$minElement" : 1
            },
            {
                "$maxElement" : 1
            }
        ]
    ]
}
}
like image 1
Eve Freeman Avatar answered Nov 17 '22 06:11

Eve Freeman