Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB - objects? Why do I need _id in aggregate

Here is an example from MongoDB tutorial (here it collection ZIP Code db:

db.zipcodes.aggregate( [
   { $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
   { $match: { totalPop: { $gte: 10*1000*1000 } } }
] )

if I replace _id with something else like word Test, I will get error message:

"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0

Could anybody help me understand why I need _id in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.

like image 715
user1700890 Avatar asked Jun 08 '15 22:06

user1700890


People also ask

Is _ID necessary in MongoDB?

All documents in MongoDB must have a populated _id field. If a document hasn't been assigned an _id value, MongoDB will automatically generate one.

What is the purpose of the $group operator in an aggregate pipeline?

The $group stage groups the documents by date and calculates the total sale amount, average quantity, and total count of the documents in each group.

What is the purpose of the aggregation pipeline in MongoDB?

What is the Aggregation Pipeline in MongoDB? The aggregation pipeline refers to a specific flow of operations that processes, transforms, and returns results. In a pipeline, successive operations are informed by the previous result. In the above example, input refers to one or more documents.

Why We Use unwind in MongoDB?

The MongoDB $unwind stages operator is used to deconstructing an array field from the input documents to output a document for each element. Every output document is the input document with the value of the array field replaced by the element.


2 Answers

In a $group stage, _id is used to designate the group condition. You obviously need it.

If you're familiar with the SQL world, think of it as the GROUP BY clause.


Please note, in that context too, _id is really an unique identifier in the generated collection, as by definition $group cannot produce two documents having the same value for that field.

like image 196
Sylvain Leroux Avatar answered Oct 14 '22 01:10

Sylvain Leroux


The _id field is mandatory, but you can set it to null if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.

In your case, grouping by _id: "$state" would result in n aggregate results of totalPop, provided there there are n distinct values for state (akin to SELECT SUM() FROM table GROUP BY state). Whereas,

$group : {_id : null, totalPop: { $sum: "$pop" }}}

would provide a single result for totalPop (akin to SELECT SUM() FROM table).

This behaviour is well described in the group operator documentation.

like image 27
0_0 Avatar answered Oct 14 '22 01:10

0_0