Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform case-insensitive aggregation grouping in MongoDb?

Let's say that I want to aggregate and group by documents in MongoDb by the Description field.

Running the following (case-sensitive by default):

db['Products'].aggregate(
    { $group: { 
        _id: { 'Description': "$Description" },
        count: { $sum: 1 },
        docs: { $push: "$_id" }
    }},
    { $match: {
        count: { $gt : 1 }
    }}
);

on my sample data gives me 1000 results, which is fine.

But now I expect that running a case-insensitive query (using $toLower) should give me less than or equal to 1000 results:

db['Products'].aggregate(
    { $group: { 
        _id: { 'Description': {$toLower: "$Description"} },
        count: { $sum: 1 },
        docs: { $push: "$_id" }
    }},
    { $match: {
        count: { $gt : 1 }
    }}
);

But instead I get more than 1000 results. That can't be right, can it? More common entries should get grouped together to yield less number of total groupings ... I think.

So then probably my aggregation query is wrong! Which brings me to my question:

How should case-insensitive aggregation grouping in MongoDb be performed?

like image 227
pulkitsinghal Avatar asked Sep 08 '17 19:09

pulkitsinghal


People also ask

How do I make a case insensitive in MongoDB?

The aggregation framework was introduced in mongodb 2.2 . You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings. It's more recommended and easier than using regex.

Is MongoDB collection case-sensitive?

Database names and Collection names are case sensitive. You can always recreate the DB/Collection with the appropriate name. The Mongo Shell is a interactive JS interpreter.

WHAT IS group in MongoDB aggregation?

A group key is often a field, or group of fields. The group key can also be the result of an expression. Use the _id field in the $group pipeline stage to set the group key.


Video Answer


1 Answers

You approach to case-insensitive grouping is correct so perhaps your observation is not? ;)

Try this example:

// insert two documents
db.getCollection('test').insertOne({"name" : "Test"}) // uppercase 'T'
db.getCollection('test').insertOne({"name" : "test"}) // lowercase 't'

// perform the grouping
db.getCollection('test').aggregate({ $group: { "_id": { $toLower: "$name" }, "count": { $sum: 1 } } }) // case insensitive
db.getCollection('test').aggregate({ $group: { "_id": "$name", "count": { $sum: 1 } } }) // case sensitive

You may have a typo somewhere?

The documentation also states that

$toLower only has a well-defined behavior for strings of ASCII characters.

Perhaps that's what's biting you here?

like image 135
dnickless Avatar answered Oct 19 '22 19:10

dnickless