I have a collection of documents where a "tags" field was switched over from being a space separated list of tags to an array of individual tags. I want to update the previous space-separated fields to all be arrays like the new incoming data.
I'm also having problems with the $type selector because it is applying the type operation to individual array elements, which are strings. So filtering by type just returns everything.
How can I get every document that looks like the first example into the format for the second example?
{
"_id" : ObjectId("12345"),
"tags" : "red blue green white"
}
{
"_id" : ObjectId("54321"),
"tags" : [
"red",
"orange",
"black"
]
}
We can't use the $type
operator to filter our documents here because the type of the elements in our array is "string" and as mentioned in the documentation:
When applied to arrays, $type matches any inner element that is of the specified BSON type. For example, when matching for $type : 'array', the document will match if the field has a nested array. It will not return results where the field itself is an array.
But fortunately MongoDB also provides the $exists
operator which can be used here with a numeric array index.
Now how can we update those documents?
Well, from MongoDB version <= 3.2, the only option we have is mapReduce()
but first let look at the other alternative in the upcoming release of MongoDB.
Starting from MongoDB 3.4, we can $project
our documents and use the $split
operator to split our string into an array of substrings.
Note that to split only those "tags" which are string, we need a logical $cond
ition processing to split only the values that are string. The condition here is $eq
which evaluate to true
when the $type
of the field is equal to "string"
. By the way $type
here is new in 3.4.
Finally we can overwrite the old collection using the $out
pipeline stage operator. But we need to explicitly specify the inclusion of other field in the $project
stage.
db.collection.aggregate(
[
{ "$project": {
"tags": {
"$cond": [
{ "$eq": [
{ "$type": "$tags" },
"string"
]},
{ "$split": [ "$tags", " " ] },
"$tags"
]
}
}},
{ "$out": "collection" }
]
)
With mapReduce
, we need to use the Array.prototype.split()
to emit the array of substrings in our map function. We also need to filter our documents using the "query" option. From there we will need to iterate the "results" array and $set
the new value for "tags" using bulk operations using the bulkWrite()
method new in 3.2 or the now deprecated Bulk()
if we are on 2.6 or 3.0 as shown here.
db.collection.mapReduce(
function() { emit(this._id, this.tags.split(" ")); },
function(key, value) {},
{
"out": { "inline": 1 },
"query": {
"tags.0": { "$exists": false },
"tags": { "$type": 2 }
}
}
)['results']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With