Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I flatten double arrays in mongoDB?

Some fields in my mongoDB documents look like this:

{
...
Countries: [["Spain", "France"]]
...
}

Or this:

{
...
Countries: [["Spain"],["Russia", "Egypt"]]
...
}

What I want to do is to turn [["Spain", "France"]] into ["Spain", "France"] and [["Spain"],["Russia", "Egypt"]] into ["Spain", "Russia", "Egypt"], similar to using the flatten method in Ruby.

Is there a way to flatten arrays in mongoDB? I need to flatten arrays in all documents in entire collection, not just a single doc, if that matters, also, the values and their quantity in arrays varies between documents.

I am also using Ruby as a driver for mongo, so a method using a Ruby driver would be also useful to me.

like image 803
A.V. Arno Avatar asked Jan 01 '15 22:01

A.V. Arno


2 Answers

In Mongo 3.4+ you can use $reduce to flatten 2d arrays.

db.collection.aggregate(
  [
    {
      $project: {
        "countries": {
          $reduce: {
            input: '$Countries',
            initialValue: [],
            in: {$concatArrays: ['$$value', '$$this']}
          }
        }
      }
    }
  ]
)

Docs: https://docs.mongodb.com/manual/reference/operator/aggregation/reduce/

like image 143
Matt Lo Avatar answered Sep 24 '22 09:09

Matt Lo


You need perform an aggregation operation with two unwind stages and a single group stage. The basic rule being you unwind as many times as the level of nest depth. Here the level of nesting is 2, so we unwind two times.

 collection.aggregate([
 {$unwind => "$Countries"},
 {$unwind => "$Countries"},
 {$group => {"_id":"$_id","Countries":{$push => "$Countries"}}}
 ])

The first $unwind stage produces the result:

{
        "_id" : ObjectId("54a32e0fc2eaf05fc77a5ea4"),
        "Countries" : [
                "Spain",
                "France"
        ]
}
{
        "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"),
        "Countries" : [
                "Spain"
        ]
}
{
        "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"),
        "Countries" : [
                "Russia",
                "Egypt"
        ]
}

The second $unwind stage further flattens the Countries array:

{ "_id" : ObjectId("54a32e0fc2eaf05fc77a5ea4"), "Countries" : "Spain" }
{ "_id" : ObjectId("54a32e0fc2eaf05fc77a5ea4"), "Countries" : "France" }
{ "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"), "Countries" : "Spain" }
{ "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"), "Countries" : "Russia" }
{ "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"), "Countries" : "Egypt" }

Now the final $group stage groups the records based on the _id,and accumulates the country names in a single array.

{
        "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"),
        "Countries" : [
                "Spain",
                "Russia",
                "Egypt"
        ]
}
{
        "_id" : ObjectId("54a32e0fc2eaf05fc77a5ea4"),
        "Countries" : [
                "Spain",
                "France"
        ]
}

If you wish to keep other fields in the document then you need to explicitly specify the names of the fields other than the country field,(field1,field2,etc..), using the $first operator. You can write/overwrite a collection by specifying the name of the collection in the $out stage.

collection.aggregate([
 {$unwind => "$Countries"},
 {$unwind => "$Countries"},
 {$group => {"_id":"$_id","Countries":{$push => "$Countries"},
             "field1":{$first => "$field1"}}},
 {$out => "collection"}
 ])

You need to explicitly specify the fields so that you don't get a redundant Countries field.

You can use the $$ROOT system variable to store the entire document, but that would make the Countries field redundant.One outside the doc and one inside the doc.

collection.aggregate([
 {$unwind => "$Countries"},
 {$unwind => "$Countries"},
 {$group => {"_id":"$_id","Countries":{$push => "$Countries"},
             "doc":{$first => "$$ROOT"}}},
 {$out => "collection"}
 ])
like image 25
BatScream Avatar answered Sep 24 '22 09:09

BatScream