The FETCH-stage is the limiting factor in my queries. Ive been reaserching and it seems that mongodb is reading much more than it needs, and not utilize the bandwidth fully.
My mongoDB-mongod instance seems to be reading to much on a single query. Testing on AWS EC2 m4.xlarge with 1 EBS io 5000Piops attached (100GB) SSD. 16 GB ram.
db stats
db.stats()
{
"db" : "database",
"collections" : 4,
"objects" : 406496932,
"avgObjSize" : 326.3196544642064,
"dataSize" : 132647938391,
"storageSize" : 55475830784,
"numExtents" : 0,
"indexes" : 5,
"indexSize" : 8940408832,
"ok" : 1
}
collection summary:
db.collection.stats() ->
{ "ns" : "database.[collection###]",
"count" : 367614513,
"size" : 121155225858,
"avgObjSize" : 329,
"storageSize" : 52052197376,
"capped" : false,
"wiredTiger" : {"Left empty"},
"nindexes" : 2,
"totalIndexSize" : 8131604480,
"indexSizes" : {
"_id_" : 4373012480,
"id_1_ts_-1" : 3758592000
},
"ok" : 1
The query:
db.[#######].find({ id : "######",
ts : {
"$gte" :
ISODate("2016-10-01T00:00:00.000Z"),
$lt :
ISODate("2016-10-07T02:00:00.000Z")
}}, {_id : 0,"u1"
:1,"u2":1,"u3":1,"eq1" :1 ,"eq2" : 1,"eq3": 1,"ts" :1});
And the explain result:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "database.[collection]",
"d" : false,
"parsedQuery" : {
"$and" : [
{
"id" : {
"$eq" : "#####ID#####"
}
},
{
"ts" : {
"$lt" : ISODate("2016-09-30T22:00:00Z")
}
},
{
"ts" : {
"$gte" : ISODate("2016-09-22T22:00:00Z")
}
}
]
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"u1" : 1,
"u2" : 1,
"u3" : 1,
"eq1" : 1,
"eq2" : 1,
"eq3" : 1,
"ts" : 1
},
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"id" : 1,
"ts" : -1
},
"indexName" : "id_1_ts_-1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"id" : [
"[\"#####ID#####\", \"#####ID#####\"]"
],
"ts" : [
"(new Date(1475272800000), new Date(1474581600000)]"
]
}
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 676745,
"executionTimeMillis" : 170508,
"totalKeysExamined" : 676745,
"totalDocsExamined" : 676745,
"executionStages" : {
"stage" : "PROJECTION",
"nReturned" : 676745,
"executionTimeMillisEstimate" : 167820,
"works" : 676746,
"advanced" : 676745,
"needTime" : 0,
"needYield" : 0,
"saveState" : 8970,
"restoreState" : 8970,
"isEOF" : 1,
"invalidates" : 0,
"transformBy" : {
"_id" : 0,
"u1" : 1,
"u2" : 1,
"u3" : 1,
"eq1" : 1,
"eq2" : 1,
"eq3" : 1,
"ts" : 1
},
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 676745,
"executionTimeMillisEstimate" : 166470,
"works" : 676746,
"advanced" : 676745,
"needTime" : 0,
"needYield" : 0,
"saveState" : 8970,
"restoreState" : 8970,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 676745,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 676745,
"executionTimeMillisEstimate" : 980,
"works" : 676746,
"advanced" : 676745,
"needTime" : 0,
"needYield" : 0,
"saveState" : 8970,
"restoreState" : 8970,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"id" : 1,
"ts" : -1
},
"indexName" : "id_1_ts_- 1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"id" : [
"[\"#####ID#####\", \"#####ID#####\"]"
],
"ts" : [
"(new Date(1475272800000), new Date(1474581600000)]"
]
},
"keysExamined" : 676745,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "ip #########",
"port" : 27017,
"version" : "3.2.10",
"gitVersion" : "79d9b3ab5ce20f51c272b4411202710a082d0317"
},
"ok" : 1
}
As we can see above mongoDb uses index. The IXSCAN takes 980ms and the FETCH ~160000ms.
If im not mistaking the whole read should be 676746(nReturned) * 329(avgObjSize ) Bytes = ~212 MB of data.
Ive notice that in the iostats (http://linuxcommand.org/man_pages/iostat1.html) the following(/data/db is on xvdf):
vg-cpu: %user %nice %system %iowait %steal %idle
0.27 0.00 0.00 21.35 0.13 78.25
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 0.00 1691.00 0.00 19.83 0.00 24.02 0.95 0.56 0.56 0.00 0.56 94.40
The rMB/s says ~20MB/s and this continious during the whole operation (fetch-stage). This means that mongodb is reading 160 s * 20MB/s = 3 200 MB, which is alot more than the 200 MB above.
Memory:
free -m
total used free shared buffers cached
Mem: 16048 12629 3418 0 32 4071
-/+ buffers/cache: 8525 7522
Swap: 0 0
Also mongodb not using the 5000 iops EBS provisioned, nor the bandwidth promised? Only using ~1700 Reads/Second resulting in ~20MB/s.
Ive changed the read-ahead to 16KB. Ive tried placing journal and log at another HDD.
I cannot figure this out! Help me. Please!
Performance. Because the index contains all fields required by the query, MongoDB can both match the query conditions and return the results using only the index. Querying only the index can be much faster than querying documents outside of the index.
MongoDB uses multikey indexes to index the content stored in arrays. If you index a field that holds an array value, MongoDB creates separate index entries for every element of the array. These multikey indexes allow queries to select documents that contain arrays by matching on element or elements of the arrays.
It is slow because it is not using an index. For each document in the logs collection, it is doing a full collection scan on the graphs collection. From the $expr documentation page: $expr only uses indexes on the from collection for equality matches in a $match stage.
I encountered the same problem when I was fetching around 35000 documents. To solve it, I used the aggregate function (sakulstra:aggregate
) and in my case it has incredibly boosted the request. The result format is obviously not the same, but it's still easy to use to compute all things I need.
Before (7000ms) :
const historicalAssetAttributes = HistoricalAssetAttributes.find({
date:{'$gte':startDate,'$lte':endDate},
assetId: {$in: assetIds}
}, {
fields:{
"date":1,
"assetId":1,
"close":1
}
}).fetch();
After (300ms):
const historicalAssetAttributes = HistoricalAssetAttributes.aggregate([
{
'$match': {
date: {'$gte': startDate, '$lte': endDate},
assetId: {$in: assetIds}
}
}, {
'$group':{
_id: {assetId: "$assetId"},
close: {
'$push': {
date: "$date",
value: "$close"
}
}
}
}
]);
The main facts are:
Assuming WiredTiger cache is set to default, the amount of RAM reserved for the WiredTiger cache should be approximately 8.6 GB. In https://docs.mongodb.com/v3.2/faq/storage/#to-what-size-should-i-set-the-wiredtiger-internal-cache:
Starting in MongoDB 3.2, the WiredTiger internal cache, by default, will use the larger of either:
- 60% of RAM minus 1 GB, or
- 1 GB.
From the information above, it appears that there is a memory pressure on your machine. MongoDB tries to keep indexes in memory for fast access, and the whole index is ~7 GB. This will effectively fill ~80% of your WiredTiger cache with just the index, leaving little space for anything else. As a result, MongoDB is forced to pull the documents in the result set from disk. At this point, performance suffers.
You can see the effect of this from the iostat
output, where the device xvdf
(where the data resides) hitting more than 94% utilization (shown in the %util
column), which means that your operation is I/O bound as you don't have enough RAM to satisfy your ideal working set.
To mitigate this issue, you could try to:
You could also review the Production Notes and the Operation Checklist for recommended settings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With