How to speed up aggregation queries?

Tags:

Following is the aggregation query :

[
  {
    "$match": {
      "UserId": {
        "$in": [
          5
        ]
      },
      "WorkflowStartTime": {
        "$gte": ISODate('2015-04-09T00:00:00.000Z'),
        "$lte": ISODate('2015-04-16T00:00:00.000Z')
      }
    }
  },
  {
    "$group": {
      "_id": {
        "Task": "$TaskId",
        "WorkflowId": "$WorkflowInstanceId"
      },
      "TaskName": {
        "$first": "$Task"
      },
      "StartTime": {
        "$first": "$StartTime"
      },
      "EndTime": {
        "$last": "$EndTime"
      },
      "LastExecutionTime": {
        "$last": "$StartTime"
      },
      "WorkflowName": {
        "$first": "$WorkflowName"
      }
    }
  },
  {
    "$project": {
      "_id": 1,
      "LastExecutionTime": 1,
      "TaskName": 1,
      "AverageExecutionTime": {
        "$subtract": [
          "$EndTime",
          "$StartTime"
        ]
      },
      "WorkflowName": 1
    }
  },
  {
    "$group": {
      "_id": "$_id.Task",
      "LastExecutionTime": {
        "$last": "$LastExecutionTime"
      },
      "AverageExecutionTime": {
        "$avg": "$AverageExecutionTime"
      },
      "TaskName": {
        "$first": "$TaskName"
      },
      "TotalInstanceCount": {
        "$sum": 1
      },
      "WorkflowName": {
        "$first": "$WorkflowName"
      }
    }
  },
  {
    "$project": {
      "Id": "$_id",
      "_id": 0,
      "Name": "$TaskName",
      "LastExecutionDate": {
        "$substr": [
          "$LastExecutionTime",
          0,
          30
        ]
      },
      "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
      "TotalInstanceCount": "$TotalInstanceCount",
      "WorkflowName": 1
    }
  }
]

My collection documents are as follows :

{
        "_id" : ObjectId("550ff07ce4b09bf056df4ac1"),
        "OutputData" : "xyz",
        "InputData" : null,
        "Location" : null,
        "ChannelName" : "XYZ",
        "UserId" : 5,
        "TaskId" : 95,
        "ChannelId" : 5,
        "Status" : "Success",
        "TaskTypeId" : 7,
        "WorkflowId" : 37,
        "Task" : "XYZ",
        "WorkflowStartTime" : ISODate("2015-03-23T05:09:26Z"),
        "EndTime" : ISODate("2015-03-23T05:22:44Z"),
        "StartTime" : ISODate("2015-03-23T05:22:44Z"),
        "TaskType" : "TRIGGER",
        "WorkflowInstanceId" : "23-3-2015-95d17f17-2580-4fe3-b627-12e862af08ce",
        "StackTrace" : null,
        "WorkflowName" : "XYZ data workflow"
}

I have a index on {WorkflowStartTime:1,UserId:1, StartTime:1}

Their are hardly 900000 records in collection, and as it is i am using a subset of data while quering using date range still it taking around 1.5 to 1.7 seconds. I have used aggregation framework with other collections with huge data and the performance is very good. Don't know what is wrong with this query as its showing very slow output, i expect it to be in mills as its a real time analytics query. Any pointer on it appreciated.

Output when {explain : true } added to aggregation query

{
  "stages": [


       {
          "$cursor": {
            "query": {
              "UserId": {
                "$in": [
                  5
                ]
              },
              "WorkflowStartTime": {
                "$gte": "ISODate(2015-04-09T00:00:00Z)",
                "$lte": "ISODate(2015-04-16T00:00:00Z)"
              }
            },
            "fields": {
              "EndTime": 1,
              "StartTime": 1,
              "Task": 1,
              "TaskId": 1,
              "WorkflowInstanceId": 1,
              "WorkflowName": 1,
              "_id": 0
            },
            "plan": {
              "cursor": "BtreeCursor ",
              "isMultiKey": false,
              "scanAndOrder": false,
              "indexBounds": {
                "WorkflowStartTime": [
                  [
                    "ISODate(2015-04-16T00:00:00Z)",
                    "ISODate(2015-04-09T00:00:00Z)"
                  ]
                ],
                "UserId": [
                  [
                    5,
                    5
                  ]
                ]
              },
              "allPlans": [
                {
                  "cursor": "BtreeCursor ",
                  "isMultiKey": false,
                  "scanAndOrder": false,
                  "indexBounds": {
                    "WorkflowStartTime": [
                      [
                        "ISODate(2015-04-16T00:00:00Z)",
                        "ISODate(2015-04-09T00:00:00Z)"
                      ]
                    ],
                    "UserId": [
                      [
                        5,
                        5
                      ]
                    ]
                  }
                }
              ]
            }
          }
        },
        {
          "$group": {
            "_id": {
              "Task": "$TaskId",
              "WorkflowId": "$WorkflowInstanceId"
            },
            "TaskName": {
              "$first": "$Task"
            },
            "StartTime": {
              "$first": "$StartTime"
            },
            "EndTime": {
              "$last": "$EndTime"
            },
            "LastExecutionTime": {
              "$last": "$StartTime"
            },
            "WorkflowName": {
              "$first": "$WorkflowName"
            }
          }
        },
        {
          "$project": {
            "_id": true,
            "LastExecutionTime": true,
            "TaskName": true,
            "AverageExecutionTime": {
              "$subtract": [
                "$EndTime",
                "$StartTime"
              ]
            },
            "WorkflowName": true
          }
        },
        {
          "$group": {
            "_id": "$_id.Task",
            "LastExecutionTime": {
              "$last": "$LastExecutionTime"
            },
            "AverageExecutionTime": {
              "$avg": "$AverageExecutionTime"
            },
            "TaskName": {
              "$first": "$TaskName"
            },
            "TotalInstanceCount": {
              "$sum": {
                "$const": 1
              }
            },
            "WorkflowName": {
              "$first": "$WorkflowName"
            }
          }
        },
        {
          "$project": {
            "_id": false,
            "Id": "$_id",
            "Name": "$TaskName",
            "LastExecutionDate": {
              "$substr": [
                "$LastExecutionTime",
                {
                  "$const": 0
                },
                {
                  "$const": 30
                }
              ]
            },
            "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
            "TotalInstanceCount": "$TotalInstanceCount",
            "WorkflowName": true
          }
        }
      ],
      "ok": 1
    }

381

asked Apr 15 '15 06:04

Ninad

1 Answers

The aggregation don't use any Index. You need create a new Index:

{UserId:1,WorkflowStartTime:1}

If all is good, the agregation + explain must appear this line:

    "winningPlan" :...

166

answered Oct 07 '22 07:10

Alberto J García Gago

Related questions
                            
                                Persisting data on disk using Hazelcast
                            
                                OSGi WeavingHook Examples
                            
                                Using JMeter to test a socket
                            
                                JPA2 Criteria: How to avoid a cross join using path.get()
                            
                                import javax.xml.bind.annotation.XmlRootElement not resolved
                            
                                Apache CLI option value with whitespace
                            
                                Replace Eclipse with Sublime Text 2/3 for java
                            
                                libGDX project can't load library liblwjgl.dylib
                            
                                Android Spinner dropDownHorizontalOffset not functioning but dropDownVerticleOffest is
                            
                                How to add Nexus repository index in IntelliJ IDEA?
                            
                                Java Bug with ambiguous methods using varargs?
                            
                                How to implement Camera functionality with extra options (like Panorama)?
                            
                                Qemu user-emulation with Java
                            
                                AutoCloseable "resource leak" warning for factory created instances?
                            
                                how to get mobile online status of the facebook user using smack?
                            
                                Unchecked casts and unnecessary suppressed warnings with lambdas
                            
                                How do you design a services layer that's not all try/catch blocks?
                            
                                Kraken API: Problems with authentication (Invalid key)
                            
                                How to jump to clojure.lang java source?
                            
                                How to diagnose or detect deadlocks in Java static initializers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to speed up aggregation queries?

Tags:

java

mongodb

aggregation-framework

mongo-java-driver

Ninad

People also ask

1 Answers

Alberto J García Gago

Recent Activity

Donate For Us