Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Data MongoDB and allowDiskUse

I have a query like this:

db.tqaP.aggregate([
            {$match : { $and: [
                                {"eventUTCDate" : {
                                                    $gte : '01-10-2014'
                                                  }
                                }, 
                    {"eventUTCDate" : {
                                                    $lt : '31-10-2014'
                                                  }
                                }, 
                                {"mpTransactionId":{
                                                    $exists: true
                                                   }
                                },
                                {testMode : false},
                                {eventID : {
                                            $in : [
                                                    230, // ContentDiscoveredEvent
                                                    204, // ContentSLAStartEvent
                                                    211, // ContentProcessedEndEvent
                                                    255, // ContentValidationStatusEvent
                                                    256, // ContentErrorEvent
                                                    231, // ContentAnalyzedEvent
                                                    240, // ContentTranscodeStartEvent
                                                    241, // ContentTranscodeEndEvent
                                                    252  // AbortJobEvent
                                                    //205, 207
                                                  ]
                                            }
                                }
                        ]}}, 
          {$project : 
                        {
                            _id:0,
                            event : {
                                eventID                 : "$eventID",
                                eventUTCDate            : "$eventUTCDate", 
                                processState            : "$processState", 
                                jobInstanceId           : "$jobInstanceId", 
                                mpTransactionId         : "$mpTransactionId",
                                eventUID                : "$eventUID",
                                contextJobInstanceId    : "$context.jobInstanceId", 
                                contextValidationStatus : "$context.validationStatus", 
                                metaUpdateOnly          : "$metaUpdateOnly", 
                                errorCode               : "$errorCode",
                                transcodingProfileName  : "$transcodingProfileName",
                                contextAssetId          : "$context.assetId"
                            }
                        }
          },
          // Creating the hash map <mpTransactionId, listOfAssociatedEvents>
          {$group   :     {
                            "_id"               : "$event.mpTransactionId", 
                            "chainOfEvents"     : {$addToSet : "$event"}
                          },
          },
          // Sorting by chainOfEvents.eventUTCDate
          {$unwind      : "$chainOfEvents"}, 
          {$sort        : {
                            "chainOfEvents.eventUTCDate":1
                          }
          },
          {$group       : {
                            _id :"$_id", 
                            chainOfEvents: {
                                                $push:"$chainOfEvents"
                                           }
                          }
          }
       ])

that runs over 1.2 millions records and dies. The error message is

assert: command failed: {
        "errmsg" : "exception: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDi
skUse:true to opt in.",
        "code" : 16819,
        "ok" : 0
} : aggregate failed

I fix this by adding between the last closing brackets (the square and the round one)

,{allowDiskUsage: true}

Now I am trying to express the same thing using Spring Data for MongoDB and my Java code looks like:

MatchOperation match = Aggregation.match( new Criteria()
                            .andOperator(
                                        Criteria.where("eventUTCDate").gte(startDateAsString),
                                        Criteria.where("eventUTCDate").lt(endDateAsString))
                            .and("mpTransactionId").exists(true)
                            .and("testMode").is(false)
                            .and("eventID").in(230, 204, 211, 255, 256, 231, 240, 241, 252) );

    ProjectionOperation projection = Aggregation.project().and("event").
                                nested(bind("eventID", "eventID").
                                        and("eventUTCDate", "eventUTCDate").
                                        and("processState", "processState").
                                        and("jobInstanceId", "jobInstanceId").
                                        and("mpTransactionId", "mpTransactionId").
                                        and("eventUID", "eventUID").
                                        and("contextJobInstanceId", "context.jobInstanceId").
                                        and("contextValidationStatus", "context.validationStatus").
                                        and("metaUpdateOnly", "metaUpdateOnly").
                                        and("errorCode", "errorCode").
                                        and("transcodingProfileName", "transcodingProfileName").
                                        and("contextAssetId", "context.assetId"));

    GroupOperation group = Aggregation.group("event.mpTransactionId").addToSet("event").as("chainOfEvents");

    UnwindOperation unwind = Aggregation.unwind("chainOfEvents");

    SortOperation sort = Aggregation.sort(Sort.Direction.ASC, "chainOfEvents.eventUTCDate");

    GroupOperation groupAgain = Aggregation.group("_id").push("chainOfEvents").as("eventsList");


    Aggregation agg = newAggregation(Event.class, match,  projection, group, unwind, sort, groupAgain).withOptions(Aggregation.newAggregationOptions().allowDiskUse(true).build());
    AggregationResults<EventsChain> results = mongoOps.aggregate(agg, "tqaP", EventsChain.class);

but I receive a set of empty results. This query was working for a smaller set of data. I just added

.withOptions(Aggregation.newAggregationOptions().allowDiskUse(true).build());

in order to adapt to the size of the data. Can anybody tell if I am using incorrectly?

I am using MongoDB 2.6.4 and Spring-Data-MongoDB version 1.6.1-RELEASE.

like image 571
user2673474 Avatar asked Nov 10 '14 23:11

user2673474


1 Answers

Here is a working solution 2.1.8 using MongoTemplate class helper.

AggregationOptions options = AggregationOptions.builder().allowDiskUse(true).build();
List<AggregationOperation> aggs = Arrays.asList(m1, p1, g1);
        mongoTemplate.aggregate(Aggregation.newAggregation(aggs).withOptions(options), inputCollectionName, Document.class);
like image 103
charlycou Avatar answered Oct 05 '22 20:10

charlycou