Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb distinct on a array field with regex query?

Basically i'm trying to implement tags functionality on a model.

> db.event.distinct("tags")
[ "bar", "foo", "foobar" ]

Doing a simple distinct query retrieves me all distinct tags. However how would i go about getting all distinct tags that match a certain query? Say for example i wanted to get all tags matching foo and then expecting to get ["foo","foobar"] as a result?

The following queries is my failed attempts of achieving this:

> db.event.distinct("tags",/foo/)
[ "bar", "foo", "foobar" ]

> db.event.distinct("tags",{tags: {$regex: 'foo'}})
[ "bar", "foo", "foobar" ]
like image 237
netbrain Avatar asked Jan 16 '15 08:01

netbrain


1 Answers

The aggregation framework and not the .distinct() command:

db.event.aggregate([
    // De-normalize the array content to separate documents
    { "$unwind": "$tags" },

    // Filter the de-normalized content to remove non-matches
    { "$match": { "tags": /foo/ } },

    // Group the "like" terms as the "key"
    { "$group": {
        "_id": "$tags"
    }}
])

You are probably better of using an "anchor" to the beginning of the regex is you mean from the "start" of the string. And also doing this $match before you process $unwind as well:

db.event.aggregate([
    // Match the possible documents. Always the best approach
    { "$match": { "tags": /^foo/ } },

    // De-normalize the array content to separate documents
    { "$unwind": "$tags" },

    // Now "filter" the content to actual matches
    { "$match": { "tags": /^foo/ } },

    // Group the "like" terms as the "key"
    { "$group": {
        "_id": "$tags"
    }}
])

That makes sure you are not processing $unwind on every document in the collection and only those that possibly contain your "matched tags" value before you "filter" to make sure.

The really "complex" way to somewhat mitigate large arrays with possible matches takes a bit more work, and MongoDB 2.6 or greater:

db.event.aggregate([
    { "$match": { "tags": /^foo/ } },
    { "$project": {
        "tags": { "$setDifference": [
            { "$map": {
                "input": "$tags",
                "as": "el",
                "in": { "$cond": [
                    { "$eq": [ 
                        { "$substr": [ "$$el", 0, 3 ] },
                        "foo"
                    ]},
                    "$$el",
                    false
                ]}
            }},
            [false]
        ]}
    }},
    { "$unwind": "$tags" },
    { "$group": { "_id": "$tags" }}
])

So $map is a nice "in-line" processor of arrays but it can only go so far. The $setDifference operator negates the false matches, but ultimately you still need to process $unwind to do the remaining $group stage for distinct values overall.

The advantage here is that arrays are now "reduced" to only the "tags" element that matches. Just don't use this when you want a "count" of the occurrences when there are "multiple distinct" values in the same document. But again, there are other ways to handle that.

like image 188
Neil Lunn Avatar answered Nov 15 '22 08:11

Neil Lunn