Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform union in mongoDB

I'm wondering how to perform a kind of union in an aggregate in MongoDB. Let's imaging the following document in a collection (the structure is for the sake of the example) :

{
  linkedIn: {
    people : [
    {
      name : 'Fred'
     },
     {
       name : 'Matilda'
     }
   ]
  },
  twitter: {
    people : [
    {
       name : 'Hanna'
    },
    {
       name : 'Walter'
    }
   ]
  }
 }

How to make an aggregate that returns the union of the people in twitter and linkedIn ?

{
 { name :'Fred', source : 'LinkedIn'},
 { name :'Matilda', source : 'LinkedIn'},
 { name :'Hanna', source : 'Twitter'},
 { name :'Walter', source : 'Twitter'},
}
like image 295
ic3 Avatar asked May 05 '14 14:05

ic3


People also ask

How do I join two queries in MongoDB?

MongoDB is not a relational database, but you can perform a left outer join by using the $lookup stage. The $lookup stage lets you specify which collection you want to join with the current collection, and which fields that should match.

How do I get the difference between two dates in MongoDB?

The $dateDiff expression returns the integer difference between the startDate and endDate measured in the specified units . Durations are measured by counting the number of times a unit boundary is passed. For example, two dates that are 18 months apart would return 1 year difference instead of 1.5 years .

What is $project in MongoDB?

The $project takes a document that can specify the inclusion of fields, the suppression of the _id field, the addition of new fields, and the resetting of the values of existing fields. Alternatively, you may specify the exclusion of fields. The $project specifications have the following forms: Form. Description.

How do I use distinct aggregation in MongoDB?

You can use $addToSet with the aggregation framework to count distinct objects. Not a generic solution, if you have a large number of unique zip codes per result, this array would be very large.


2 Answers

There are a couple of approaches to this that you can use the aggregate method for

db.collection.aggregate([
    // Assign an array of constants to each document
    { "$project": {
        "linkedIn": 1,
        "twitter": 1,
        "source": { "$cond": [1, ["linkedIn", "twitter"],0 ] }
    }},

    // Unwind the array
    { "$unwind": "$source" },

    // Conditionally push the fields based on the matching constant
    { "$group": { 
        "_id": "$_id",
        "data": { "$push": {
            "$cond": [
                { "$eq": [ "$source", "linkedIn" ] },
                { "source": "$source", "people": "$linkedIn.people" },
                { "source": "$source", "people": "$twitter.people" }
            ]
        }}
    }},

    // Unwind that array
    { "$unwind": "$data" },

    // Unwind the underlying people array
    { "$unwind": "$data.people" },

    // Project the required fields
    { "$project": {
        "_id": 0,
        "name": "$data.people.name",
        "source": "$data.source"
    }}
])

Or with a different approach using some operators from MongoDB 2.6:

db.people.aggregate([
    // Unwind the "linkedIn" people
    { "$unwind": "$linkedIn.people" },

    // Tag their source and re-group the array
    { "$group": {
        "_id": "$_id",
        "linkedIn": { "$push": {
            "name": "$linkedIn.people.name",
            "source": { "$literal": "linkedIn" }
        }},
        "twitter": { "$first": "$twitter" }
    }},

    // Unwind the "twitter" people
    { "$unwind": "$twitter.people" },

    // Tag their source and re-group the array
    { "$group": {
        "_id": "$_id",
        "linkedIn": { "$first": "$linkedIn" },
        "twitter": { "$push": {
            "name":  "$twitter.people.name",
            "source": { "$literal": "twitter" }
        }}
    }},

    // Merge the sets with "$setUnion"
    { "$project": {
        "data": { "$setUnion": [ "$twitter", "$linkedIn" ] }
    }},

    // Unwind the union array
    { "$unwind": "$data" },

    // Project the fields
    { "$project": {
        "_id": 0,
        "name": "$data.name",
        "source": "$data.source"
    }}
])

And of course if you simply did not care what the source was:

db.collection.aggregate([
    // Union the two arrays
    { "$project": {
        "data": { "$setUnion": [
            "$linkedIn.people",
            "$twitter.people"
        ]}
    }},

    // Unwind the union array
    { "$unwind": "$data" },

    // Project the fields
    { "$project": {
        "_id": 0,
        "name": "$data.name",
    }}

])
like image 82
Neil Lunn Avatar answered Oct 18 '22 22:10

Neil Lunn


Not sure if using aggregate is recommended over a map-reduce for that kind of operation but the following is doing what you're asking for (dunno if $const can be used with no issue at all in the .aggregate() function) :

aggregate([ 
   { $project: { linkedIn: '$linkedIn', twitter: '$twitter', idx: { $const: [0,1] }}},
   { $unwind: '$idx' },
   { $group: { _id : '$_id', data: { $push: { $cond:[ {$eq:['$idx', 0]}, { source: {$const: 'LinkedIn'}, people: '$linkedIn.people' } , { source: {$const: 'Twitter'}, people: '$twitter.people' } ] }}}},
   { $unwind: '$data'},
   { $unwind: '$data.people'},
   { $project: { _id: 0, name: '$data.people.name', source: '$data.source' }}
])
like image 24
Marc Polizzi Avatar answered Oct 18 '22 22:10

Marc Polizzi