Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i remove empty string from a mongodb collection?

Tags:

mongodb

I have a "mongodb colllenctions" and I'd like to remove the "empty strings"with keys from it.

From this:

{
    "_id" : ObjectId("56323d975134a77adac312c5"), 
    "year" : "15", 
    "year_comment" : "", 
}
{
    "_id" : ObjectId("56323d975134a77adac312c5"), 
    "year" : "", 
    "year_comment" : "asd", 
}

I'd like to gain this result:

{
    "_id" : ObjectId("56323d975134a77adac312c5"), 
    "year" : "15", 
}
{
    "_id" : ObjectId("56323d975134a77adac312c5"), 
    "year_comment" : "asd", 
}

How could I solve it?

like image 959
Ferenc Straub Avatar asked Nov 02 '15 11:11

Ferenc Straub


3 Answers

Please try executing following code snippet in Mongo shell which strips fields with empty or null values

var result=new Array();
db.getCollection('test').find({}).forEach(function(data)
{
  for(var i in data)
  {
      if(data[i]==null || data[i]=='')
      {
         delete data[i]
      }
  }
  result.push(data)

})

print(tojson(result))
like image 102
Rubin Porwal Avatar answered Oct 16 '22 14:10

Rubin Porwal


Would start with getting a distinct list of all the keys in the collection, use those keys as your query basis and do an ordered bulk update using the Bulk API operations. The update statement uses the $unset operator to remove the fields.

The mechanism to get distinct keys list that you need to assemble the query is possible through Map-Reduce. The following mapreduce operation will populate a separate collection with all the keys as the _id values:

mr = db.runCommand({
    "mapreduce": "my_collection",
    "map" : function() {
        for (var key in this) { emit(key, null); }
    },
    "reduce" : function(key, stuff) { return null; }, 
    "out": "my_collection" + "_keys"
})

To get a list of all the dynamic keys, run distinct on the resulting collection:

db[mr.result].distinct("_id")
// prints ["_id", "year", "year_comment", ...]

Now given the list above, you can assemble your query by creating an object that will have its properties set within a loop. Normally your query will have this structure:

var keysList = ["_id", "year", "year_comment"];
var query = keysList.reduce(function(obj, k) {
      var q = {};
      q[k] = "";
      obj["$or"].push(q);
      return obj;
    }, { "$or": [] });
printjson(query); // prints {"$or":[{"_id":""},{"year":""},{"year_comment":""}]} 

You can then use the Bulk API (available with MongoDB 2.6 and above) as a way of streamlining your updates for better performance with the query above. Overall, you should be able to have something working as:

var bulk = db.collection.initializeOrderedBulkOp(),
    counter = 0,
    query = {"$or":[{"_id":""},{"year":""},{"year_comment":""}]},
    keysList = ["_id", "year", "year_comment"];


db.collection.find(query).forEach(function(doc){
    var emptyKeys = keysList.filter(function(k) { // use filter to return an array of keys which have empty strings
            return doc[k]==="";
        }),
        update = emptyKeys.reduce(function(obj, k) { // set the update object 
            obj[k] = "";
            return obj;
        }, { });

    bulk.find({ "_id": doc._id }).updateOne({
        "$unset": update // use the $unset operator to remove the fields
    });

    counter++;
    if (counter % 1000 == 0) {
        // Execute per 1000 operations and re-initialize every 1000 update statements
        bulk.execute();
        bulk = db.collection.initializeOrderedBulkOp();
    }
})
like image 3
chridam Avatar answered Oct 16 '22 16:10

chridam


If you need to update a single blank parameter or you prefer to do parameter by parameter, you can use the mongo updateMany functionality:

db.comments.updateMany({year: ""}, { $unset : { year : 1 }})
like image 3
J.C. Gras Avatar answered Oct 16 '22 15:10

J.C. Gras