Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongodb aggregate query isn't returning proper sum on using $sum

I have a collection students with documents in the following format:-

{
 _id:"53fe74a866455060e003c2db",
 name:"sam",
 subject:"maths",
 marks:"77"
}
{
 _id:"53fe79cbef038fee879263d2",
 name:"ryan", 
 subject:"bio",
 marks:"82"
}
{
 _id:"53fe74a866456060e003c2de",
 name:"tony",
 subject:"maths",
 marks:"86"
}

I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.

db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])

Now I should get the following result-

{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}

But I get-

{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}

Can someone point out what I might be doing wrong here?

like image 727
Rookie Avatar asked Apr 10 '15 12:04

Rookie


2 Answers

Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.


Option 1: Update Schema (Change Data Type)

The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:

db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
    db.student.update(
        { "_id": doc._id, "marks": { "$type": 2 } }, 
        { "$set": { "marks": parseInt(doc.marks) } }
    );
});

For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:

MongoDB versions >= 2.6 and < 3.2:

var bulk = db.student.initializeUnorderedBulkOp(),
    counter = 0;

db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {    
    bulk.find({ "_id": doc._id }).updateOne({ 
        "$set": { "marks": parseInt(doc.marks) } 
    });

    counter++;
    if (counter % 1000 === 0) {
        // Execute per 1000 operations 
        bulk.execute(); 

        // re-initialize every 1000 update statements
        bulk = db.student.initializeUnorderedBulkOp();
    }
})

// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute(); 

MongoDB version 3.2 and newer:

var ops = [],
    cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});

cursor.forEach(function (doc) {     
    ops.push({ 
        "updateOne": { 
            "filter": { "_id": doc._id } ,              
            "update": { "$set": { "marks": parseInt(doc.marks) } } 
        }         
    });

    if (ops.length === 1000) {
        db.student.bulkWrite(ops);
        ops = [];
    }     
});

if (ops.length > 0) db.student.bulkWrite(ops);

Option 2: Run MapReduce

The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().

In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:

var mapper = function () {
    var x = parseInt(this.marks);
    emit(this.subject, x);
};

Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject. The function reduces the valuesMarks array to the sum of its elements.

var reducer = function(keySubject, valuesMarks) {
    return Array.sum(valuesMarks);
};

db.student.mapReduce(
    mapper,
    reducer,
    {
        out : "example_results",
        query: { subject : "maths" }       
    }
 );

With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:

/* 0 */
{
    "_id" : "maths",
    "value" : 163
}
like image 112
chridam Avatar answered Oct 15 '22 14:10

chridam


Possible causes your sum is being returned 0 are :

  1. The field you are summing up is not an integer but a string.

    Make sure the field contains numeric values.

  2. You are using wrong syntax of $sum.

    db.c1.aggregate([{ $group: { _id: "$item", price: { $sum: "$price" }, count: { $sum: 1 } } }])

    Make sure you use "$price" and not "price".

  3. One of the most silly mistake due to which this error occurs is:

    Use of space or tab inside the quotes while specifying field name.

    Example - "$price " won't work !!! But, "$price" would work.

like image 38
Juhi Shelar Avatar answered Oct 15 '22 16:10

Juhi Shelar