Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

View progress of long running mongodb aggregation job

I have a long running job using Mongodb's (2.6.0-rc2) aggregation framework: http://docs.mongodb.org/manual/core/aggregation-introduction/

I have written the aggregation in javascript and run the job as a script
(i.e. mongo localhost:27017/test myjsfile.js).
After starting the script, is there any way to see the progress of the job?

For example, using the sample aggregation job:

db.zipcodes.aggregate([
    {$group: {
        _id: "$state",
        totalPop: {$sum: "$pop"}
    }},
    {$match: {totalPop: {$gte: 10*1000*1000 }}}
])

I would like to see that the job is currently performing a group and is 70% done.

For mongo's map reduce jobs, you can view progress via db.currentOp(), which has a progress field that shows the percentage of the job that is complete, as outlined in this post:

Is it possible to get map reduce progress notifications in mongo?

Is there anything similar for aggregate?

like image 269
Jeff Tsui Avatar asked Mar 29 '14 01:03

Jeff Tsui


People also ask

How fast is MongoDB aggregation?

Aggregated:367 days.

Which is faster aggregate or find in MongoDB?

The aggregation query takes ~80ms while the find query takes 0 or 1ms.

Is aggregation good in MongoDB?

In MongoDB, aggregation operations process the data records/documents and return computed results. It collects values from various documents and groups them together and then performs different types of operations on that grouped data like sum, average, minimum, maximum, etc to return a computed result.


1 Answers

If you use the $out aggregation pipeline operator to output the result of the aggregation to another (or the same) collection, you can open a new mongo shell and see how many documents are in the new collection. If you're overwriting the collection you're aggregating from, MongoDB will use a temporary collection name in order to make the operation atomic, like tmp.agg_out.1. So, run

db['tmp.agg_out.1'].count()

To find out the exact name of the temporary collection, you can tail the current MongoDB log and watch for messages about the aggregation. mLab and other cloud MongoDB hosting providers may have a handy "stream current log" option as well.

For example, while running the query in this answer, the relevant log messages may look like this:

2019-04-05T03:55:42.126-0700 I COMMAND [conn244209] command collection.tmp.agg_out.1 appName: "MongoDB Shell" command: insert { insert: "tmp.agg_out.1", ordered: true, $db: "mydb" } ninserted:18145 keysInserted:351002 numYields:0 locks:{ Global: { acquireCount: { r: 70917, w: 61737 } }, Database: { ... }, Collection: { ... }, Metadata: { ... }, oplog: { ... } protocol:op_msg 161451ms

(I was hoping that nInserted or keysInserted would indicate progress, but that doesn't seem to be the case; the count of the documents in the temporary collection was a much more accurate progress indicator.)

like image 105
Dan Dascalescu Avatar answered Nov 02 '22 14:11

Dan Dascalescu