I have Mongodb collection with about 7 million documents that represents places.
I run a query that search for places that their name start with a prefix near a specific location.
We have a compound index as described bellow to speed up the search.
When the search query find match (even if only one) the query is execute very fast (~20 milisec). But when there is no match it can take 30 sec for the query to execute.
Please assist.
In detailed:
Each place (geoData) has the following fields:
"loc" - a GeoJSON point that represent the location
"categoriesIds" - array of int ids
"name" - the name of the placee
The following index is defined on this collection:
{
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
}
The query is:
db.geoData.find({
"loc":{
"$near":{
"$geometry":{
"type": "Point" ,
"coordinates": [ -0.10675191879272461 , 51.531600743186644]
},
"$maxDistance": 5000.0
}
},
"categoriesIds":{
"$in": [ 1 , 2 , 71 , 70 , 74 , 72 , 73 , 69 , 44 , 26 , 27 , 33 , 43 , 45 , 53 , 79]
},
"name":{ "$regex": "^Cafe Ne"}
})
Execution stats (Link to the whole explain result)
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 169,
"totalKeysExamined" : 14333,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "GEO_NEAR_2DSPHERE",
"nReturned" : 1,
"executionTimeMillisEstimate" : 60,
"works" : 14354,
"advanced" : 1,
"needTime" : 14351,
"needFetch" : 0,
"saveState" : 361,
"restoreState" : 361,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
},
"indexName" : "loc_2dsphere_categoriesIds_1_name_1",
"searchIntervals" : [
{
"minDistance" : 0,
"maxDistance" : 3408.329295346151,
"maxInclusive" : false
},
{
"minDistance" : 3408.329295346151,
"maxDistance" : 5000,
"maxInclusive" : true
}
],
"inputStages" : [
{
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 20,
"works" : 6413,
"advanced" : 1,
"needTime" : 6411,
"needFetch" : 0,
"saveState" : 361,
"restoreState" : 361,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"TwoDSphereKeyInRegionExpression" : true
},
"nReturned" : 1,
"executionTimeMillisEstimate" : 20,
"works" : 6413,
"advanced" : 1,
"needTime" : 6411,
"needFetch" : 0,
"saveState" : 361,
"restoreState" : 361,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
},
"indexName" : "loc_2dsphere_categoriesIds_1_name_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230031\"]",
"[\"2f10032300311\", \"2f10032300312\")",
"[\"2f10032300312\", \"2f10032300313\")",
"[\"2f10032300313\", \"2f10032300314\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")"
],
"categoriesIds" : [
"[1.0, 1.0]",
"[2.0, 2.0]",
"[26.0, 26.0]",
"[27.0, 27.0]",
"[33.0, 33.0]",
"[43.0, 43.0]",
"[44.0, 44.0]",
"[45.0, 45.0]",
"[53.0, 53.0]",
"[69.0, 69.0]",
"[70.0, 70.0]",
"[71.0, 71.0]",
"[72.0, 72.0]",
"[73.0, 73.0]",
"[74.0, 74.0]",
"[79.0, 79.0]"
],
"name" : [
"[\"Cafe Ne\", \"Cafe Nf\")",
"[/^Cafe Ne/, /^Cafe Ne/]"
]
},
"keysExamined" : 6412,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 1
}
},
{
"stage" : "FETCH",
"nReturned" : 0,
"executionTimeMillisEstimate" : 40,
"works" : 7922,
"advanced" : 0,
"needTime" : 7921,
"needFetch" : 0,
"saveState" : 261,
"restoreState" : 261,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 0,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"TwoDSphereKeyInRegionExpression" : true
},
"nReturned" : 0,
"executionTimeMillisEstimate" : 40,
"works" : 7922,
"advanced" : 0,
"needTime" : 7921,
"needFetch" : 0,
"saveState" : 261,
"restoreState" : 261,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
},
"indexName" : "loc_2dsphere_categoriesIds_1_name_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300011\", \"2f10032300012\")",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230032\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")",
"[\"2f10032300322\", \"2f10032300323\")"
],
"categoriesIds" : [
"[1.0, 1.0]",
"[2.0, 2.0]",
"[26.0, 26.0]",
"[27.0, 27.0]",
"[33.0, 33.0]",
"[43.0, 43.0]",
"[44.0, 44.0]",
"[45.0, 45.0]",
"[53.0, 53.0]",
"[69.0, 69.0]",
"[70.0, 70.0]",
"[71.0, 71.0]",
"[72.0, 72.0]",
"[73.0, 73.0]",
"[74.0, 74.0]",
"[79.0, 79.0]"
],
"name" : [
"[\"Cafe Ne\", \"Cafe Nf\")",
"[/^Cafe Ne/, /^Cafe Ne/]"
]
},
"keysExamined" : 7921,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0
}
}
]
},
Execution stats when searching for "CafeNeeNNN" instead of "Cafe Ne" (Link to the whole explain result )
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 0,
"executionTimeMillis" : 2537,
"totalKeysExamined" : 232259,
"totalDocsExamined" : 162658,
"executionStages" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"name" : /^CafeNeeNNN/
},
{
"categoriesIds" : {
"$in" : [
1,
2,
26,
27,
33,
43,
44,
45,
53,
69,
70,
71,
72,
73,
74,
79
]
}
}
]
},
"nReturned" : 0,
"executionTimeMillisEstimate" : 1330,
"works" : 302752,
"advanced" : 0,
"needTime" : 302750,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 70486,
"alreadyHasObj" : 70486,
"inputStage" : {
"stage" : "GEO_NEAR_2DSPHERE",
"nReturned" : 70486,
"executionTimeMillisEstimate" : 1290,
"works" : 302751,
"advanced" : 70486,
"needTime" : 232264,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere"
},
"indexName" : "loc_2dsphere",
"searchIntervals" : [
{
"minDistance" : 0,
"maxDistance" : 3408.329295346151,
"maxInclusive" : false
},
{
"minDistance" : 3408.329295346151,
"maxDistance" : 5000,
"maxInclusive" : true
}
],
"inputStages" : [
{
"stage" : "FETCH",
"nReturned" : 44540,
"executionTimeMillisEstimate" : 110,
"works" : 102690,
"advanced" : 44540,
"needTime" : 58149,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 44540,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"TwoDSphereKeyInRegionExpression" : true
},
"nReturned" : 44540,
"executionTimeMillisEstimate" : 90,
"works" : 102690,
"advanced" : 44540,
"needTime" : 58149,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere"
},
"indexName" : "loc_2dsphere",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230031\"]",
"[\"2f10032300311\", \"2f10032300312\")",
"[\"2f10032300312\", \"2f10032300313\")",
"[\"2f10032300313\", \"2f10032300314\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")"
]
},
"keysExamined" : 102689,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 44540
}
},
{
"stage" : "FETCH",
"nReturned" : 47632,
"executionTimeMillisEstimate" : 250,
"works" : 129571,
"advanced" : 47632,
"needTime" : 81938,
"needFetch" : 0,
"saveState" : 2556,
"restoreState" : 2556,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 47632,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"TwoDSphereKeyInRegionExpression" : true
},
"nReturned" : 47632,
"executionTimeMillisEstimate" : 230,
"works" : 129571,
"advanced" : 47632,
"needTime" : 81938,
"needFetch" : 0,
"saveState" : 2556,
"restoreState" : 2556,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere"
},
"indexName" : "loc_2dsphere",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300011\", \"2f10032300012\")",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230032\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")",
"[\"2f10032300322\", \"2f10032300323\")"
]
},
"keysExamined" : 129570,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 47632
}
}
]
}
},
Indexes on the collection
{
"0" : {
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "wego.geoData"
},
"1" : {
"v" : 1,
"key" : {
"srcId" : 1
},
"name" : "srcId_1",
"ns" : "wego.geoData"
},
"2" : {
"v" : 1,
"key" : {
"loc" : "2dsphere"
},
"name" : "loc_2dsphere",
"ns" : "wego.geoData",
"2dsphereIndexVersion" : 2
},
"3" : {
"v" : 1,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "wego.geoData"
},
"4" : {
"v" : 1,
"key" : {
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
},
"name" : "loc_2dsphere_categoriesIds_1_name_1",
"ns" : "wego.geoData",
"2dsphereIndexVersion" : 2
},
"5" : {
"v" : 1,
"key" : {
"loc" : "2dsphere",
"categoriesIds" : 1,
"keywords" : 1
},
"name" : "loc_2dsphere_categoriesIds_1_keywords_1",
"ns" : "wego.geoData",
"2dsphereIndexVersion" : 2
}
}
Collection stats link
You must apply limit() to the cursor before retrieving any documents from the database. Use limit() to maximize performance and prevent MongoDB from returning more results than required for processing.
The slow queries can happen when you do not have proper DB indexes. Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement.
Indexes also improve efficiency on queries that routinely sort on a given field. Because MongoDB can read indexes in both ascending and descending order, the direction of a single-key index does not matter. Indexes support queries, update operations, and some phases of the aggregation pipeline.
I am going to speculate here a bit, and then a comment about your design.
First, when you create an index on key which has an array on a value you create a record for each element of the array:
To index a field that holds an array value, MongoDB creates an index key for each element in the array.
This is from MongoDB own documentation about indecies.
So, if your typical record more than a hand full of categories and you have 7 million records, your index is huge, and it will also take time to scan the index itself to find that the index does not contain what you are looking for. It is still faster than a collection scan, but it darn slow compared to how fast it takes to find an existing record.
Now, let me comment about your schema design. This is a matter of style so feel free to ignore this part.
You have a record which might be in 17 categories. That is a bit overwhelming, and over abusing the term category
. A category is a specific
division, a way to quickly associate a thing with a group of things. What is a thing that belong to so many groups?
Let's take for example your records Cafe Ne
. I assume in the real world - and please remember, programming and applications are at best when the solve real world problems - Cafe Ne, is either a restaurant, a caffe, a jazz bar,
a dinner. It's for sure not a garage (unless, cafe means cars in a language I don't know). I can hardly imagine it's a bank or a dental clinic. I'd have to really make an effort, to find more than 10 meaningful categories, that users search a cafe by.
My point is, even though mongodb allows you to design things like that, it does not mean you have to. Try to narrow the number of categories you have and the ones you look for, and you will have much better performance.
As JohnnyHK suggested in comments, and Oz123 pointed to in his answer, the issue here appears to be an index that has grown so large that it fails to perform well as an index. I believe that in addition to the category expansion issue that has already been pointed out, the ordering of fields in your index creates trouble. Compound indexes are built according to the order of fields, and putting name
after categoriesIds
makes it more costly to query on name
.
It's clear that you need to tune your indexes. Exactly how you tune them depends on the types of queries that you are expecting to support. In particularly, I'm not sure if you'll see better performance from a compound index of loc
and name
or if you'll see better performance from individual indexes, one for loc
and one for name
. Mongo themselves are a little vague on when it's best to use a compound index and when it's best to use individual indexes and rely on index intersection.
My intuition says that individual indexes will perform better, but I'd test both scenarios.
If you anticipate needing to query by category as well, without name
or loc
fields that could narrow the query down, it's probably best to create a separate categoriesIds
index.
The order of the fields in a compound index is very important. It's hard to diagnose without having access to the real data and usage patterns, but this key might increase the odds of matching (or not) the document using only the index:
{
"loc" : "2dsphere",
"name" : 1,
"categoriesIds" : 1
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With