Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Invalid Reference while making a group aggregation after projection

This example aggregation will throw an IllegalArgumentException Invalid reference 'role'!

We got this problem every time after renaming a field after a projection stage.

    final Aggregation aggregation = newAggregation(

            // We only like to have the "company" and "empolyee.role" renamed to "role"
            project("company")
                    .and("employee.role").as("role"),

            // Group by the **renamed** "role"
            group("role").count().as("count"), // this will fail because "role" is an invalid reference.
            limit(2)
            );

    return aggregation;

The JSON we are working on looks like this:

{
    // some fields
    company : {
          // some fields
    }

    employee : {
           role : {
                    // some fields
           }

    } 
}

Thoughts:

Here Oliver said

It's important to understand that you define aggregations in terms of type properties, not document field names.

Is that the reason why we get the exception? If so, how to use the nice aggegration api spring data offers.

Update::

This is the Stacktrace i get with version 1.5.0.M1:

java.lang.IllegalArgumentException: Invalid reference 'role'!
    at org.springframework.data.mongodb.core.aggregation.ExposedFieldsAggregationOperationContext.getReference(ExposedFieldsAggregationOperationContext.java:78)
    at org.springframework.data.mongodb.core.aggregation.ExposedFieldsAggregationOperationContext.getReference(ExposedFieldsAggregationOperationContext.java:62)
    at org.springframework.data.mongodb.core.aggregation.GroupOperation.toDBObject(GroupOperation.java:292)
    at org.springframework.data.mongodb.core.aggregation.Aggregation.toDbObject(Aggregation.java:247)
    at com.xxx.report.adapter.AggrigateByTopic.aggrigateBy(AggrigateByTopic.java:38)
    at com.xxx.report.adapter.AggrigateByTopicTest.shouldAggrigate(AggrigateByTopicTest.java:38)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:74)
    at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:83)
    at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:72)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:232)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:89)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
    at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:71)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:175)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
like image 795
d0x Avatar asked Apr 29 '14 18:04

d0x


People also ask

What is the use of$ project in MongoDB?

The $project takes a document that can specify the inclusion of fields, the suppression of the _id field, the addition of new fields, and the resetting of the values of existing fields. Alternatively, you may specify the exclusion of fields. Specifies the inclusion of a field.

How to group documents in MongoDB?

Use the _id field in the $group pipeline stage to set the group key. See below for usage examples. In the $group stage output, the _id field is set to the group key for that document. The output documents can also contain additional fields that are set using accumulator expressions.

How to group the data in MongoDB?

We can group by single as well as multiple field from the collection, we can use $group operator in MongoDB to group fields from the collection and returns the new document as result. We are using $avg, $sum, $max, $min, $push, $last, $first and $addToSet operator with group by in MongoDB.

What is $$ root in MongoDB?

The $$ROOT variable contains the source documents for the group. If you'd like to just pass them through unmodified, you can do this by $pushing $$ROOT into the output from the group.


1 Answers

It is true that the implementation "does not like" the type of field aliasing that you are doing here, but in the strictest interpretation, what you are doing does not make much sense.

Your statement should be something like:

    final Aggregation aggregation = newAggregation(
          group("employee.role").count().as("count"),
          sort(Sort.Direction.DESC,"count"),
          limit(2)
    );

    System.out.println(aggregation);

Which produces the pipeline as:

{ 
    "aggregate" : "__collection__", 
    "pipeline" : [ 
        { "$group" : { 
            "_id" : "$employee.role", 
            "count" : { "$sum" : 1}
        }}, 
        { "$sort" : { "count" : -1} },
        { "$limit" : 2}
    ]
}

The point being that your $project usage here isn't really doing anything other than selecting one field that you do not use later, and creating an alias for another field that you don't really use anyway as it just becomes the _id field for your grouping. Also note the use of $sort as it doesn't really make much sense to $limit unless you have things in an expected order, and $group does not do that by itself.

As for explaining the "properties" concept, which I am not really a fan of, then you might consider the following code:

    final Aggregation aggregation = newAggregation(
          group("country","employee.role").count().as("count"),
          group("employee.role","count").count().as("totalCount"),
          sort(Sort.Direction.DESC,"totalCount"),
          limit(2)
    );

    System.out.println(aggregation);

Then the pipeline that is constructed would look like this:

{ 
    "aggregate" : "__collection__", 
    "pipeline" : [ 
        { "$group" : { 
            "_id" : { 
                "country" : "$country" , 
                "role" : "$employee.role"
            },
            "count" : { "$sum" : 1}
        }}, 
        { "$group" : { 
            "_id" : { 
                "role" : "$_id.employee.role" ,
                "count" : "$count"
            }, 
            "totalCount" : { "$sum" : 1}
        }}, 
        { "$sort" : { "totalCount" : -1} }, 
        { "$limit" : 2 }
    ]
}

So while that will run through to the output dump as shown without an exception, there is still a problem in the pipeline produced. While the first $group statement compacts an alias for the sub-document field, and all if fine at this point, it is the second $group stage that introduces a problem.

The builder methods are just "not happy" unless you refer to that field by the full "employee.role" notation as property of the original document. And though it does work out that this will now be part of the _id field from the previous stage, it completely forgets that the field was aliased.

For my two cents, that is the wrong behavior and a strong reason why I am not a big fan of the builders.

So you can use them, but I think the design is not entirely there yet and has some flaws. Again, for my money it seems safer and more flexible to just work with DBObject types to construct the pipeline and be done with it. At least you know you always get exactly what you mean.

like image 130
Neil Lunn Avatar answered Oct 08 '22 06:10

Neil Lunn