Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting some fields in Mongo from String to Array

I have a collection of documents where a "tags" field was switched over from being a space separated list of tags to an array of individual tags. I want to update the previous space-separated fields to all be arrays like the new incoming data.

I'm also having problems with the $type selector because it is applying the type operation to individual array elements, which are strings. So filtering by type just returns everything.

How can I get every document that looks like the first example into the format for the second example?

{
    "_id" : ObjectId("12345"),
    "tags" : "red blue green white"
}
{
    "_id" : ObjectId("54321"),
    "tags" : [
        "red",
        "orange",
        "black"
    ]
}
like image 688
ElPresidente Avatar asked Aug 20 '16 02:08

ElPresidente


1 Answers

We can't use the $type operator to filter our documents here because the type of the elements in our array is "string" and as mentioned in the documentation:

When applied to arrays, $type matches any inner element that is of the specified BSON type. For example, when matching for $type : 'array', the document will match if the field has a nested array. It will not return results where the field itself is an array.

But fortunately MongoDB also provides the $exists operator which can be used here with a numeric array index.

Now how can we update those documents?

Well, from MongoDB version <= 3.2, the only option we have is mapReduce() but first let look at the other alternative in the upcoming release of MongoDB.

Starting from MongoDB 3.4, we can $project our documents and use the $split operator to split our string into an array of substrings.

Note that to split only those "tags" which are string, we need a logical $condition processing to split only the values that are string. The condition here is $eq which evaluate to true when the $type of the field is equal to "string". By the way $type here is new in 3.4.

Finally we can overwrite the old collection using the $out pipeline stage operator. But we need to explicitly specify the inclusion of other field in the $project stage.

db.collection.aggregate(
     [
        { "$project": { 
            "tags": { 
                "$cond": [ 
                    { "$eq": [ 
                        { "$type": "$tags" }, 
                        "string"
                    ]}, 
                    { "$split": [ "$tags", " " ] }, 
                    "$tags" 
                ] 
            } 
        }},
        { "$out": "collection" }
    ]
)

With mapReduce, we need to use the Array.prototype.split() to emit the array of substrings in our map function. We also need to filter our documents using the "query" option. From there we will need to iterate the "results" array and $set the new value for "tags" using bulk operations using the bulkWrite() method new in 3.2 or the now deprecated Bulk() if we are on 2.6 or 3.0 as shown here.

db.collection.mapReduce(
    function() { emit(this._id, this.tags.split(" ")); }, 
    function(key, value) {}, 
    { 
        "out": { "inline": 1 }, 
        "query": { 
            "tags.0": { "$exists": false }, 
            "tags": { "$type": 2 }
        }
    }
)['results']
like image 145
styvane Avatar answered Oct 14 '22 16:10

styvane