Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongo - How can I aggregate, filter, and include an array of data from the matching documents?

I have a mongo-backed contact database going and I'm trying to find duplicate entries in a bunch of different ways.

For example, if 2 contacts have the same phone number they are flagged as a possible duplicate, ditto for email, etc.

I'm using MongoDB 2.4.2 on Debian with pyMongo and MongoEngine.

The closest I have so far is finding and counting records that contain the same phone number:

dbh.person_document.aggregate([
    {'$unwind': '$phones'},
    {'$group': {'_id': '$phones', 'count': {'$sum': 1}}},
    {'$sort': SON([('count', -1), ('_id', -1)])}
])

# Results in 
{u'ok': 1.0,
 u'result': [{u'_id': {u'number': u'404-231-4444', u'showroom_id': 5}, u'count': 5},
             {u'_id': {u'number': u'205-265-6666', u'showroom_id': 5}, u'count': 5},
             {u'_id': {u'number': u'213-785-7777', u'showroom_id': 5}, u'count': 4},
             {u'_id': {u'number': u'334-821-9999', u'showroom_id': 5}, u'count': 3}
]}

So I can get the numbers that are duplicates, but I can't for the life of me figure out how to return an array of the Documents that actually contained these items!

I wanna see this kind of return data for each number:

# The ObjectIDs of the documents that contained the duplicate phone numbers
{u'_id': {u'number': u'404-231-4444', u'showroom_id': 5}, 
  u'ids': [ObjectId('51c67e322b2192121ec4d8f2'), ObjectId('51c67e312b2192121ec4d8f0')], 
  u'count': 2},

Any help is greatly appreciated!

like image 245
Marcel Chastain Avatar asked Jul 18 '13 18:07

Marcel Chastain


People also ask

How do you filter an array of objects in MongoDB aggregation?

Filter MongoDB Array Element Using $Filter Operator This operator uses three variables: input – This represents the array that we want to extract. cond – This represents the set of conditions that must be met. as – This optional field contains a name for the variable that represent each element of the input array.

How do you do aggregate queries in MongoDB?

Build a To-Do List App with Node, Express, React and MongoDB. Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.

How do I query an array of objects in MongoDB?

To search the array of object in MongoDB, you can use $elemMatch operator. This operator allows us to search for more than one component from an array object.

How do I create an array field in MongoDB?

Build a To-Do List App with Node, Express, React and MongoDB. Case 1 − Create array with MongoDB. If you want to create an array of field UserName and do not want the field _id, use the below query. If you want to create an array with field name _id only, use the below query.


1 Answers

Ah, blessed be.

Found the solution almost word for word at MongoDB - Use aggregation framework or mapreduce for matching array of strings within documents (profile matching) .

Final result, adding some extra to include the name:

dbh.person_document.aggregate([
    {'$unwind': '$phones'},
    {'$group': {
        '_id': '$phones',
        'matchedDocuments': {
            '$push':{
                'id': '$_id',
                'name': '$full_name'
                }},
        'num': { '$sum': 1}
    }},
    {'$match':{'num': {'$gt': 1}}}
])
like image 193
Marcel Chastain Avatar answered Oct 12 '22 03:10

Marcel Chastain