Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding md5 hash value to mongo collection

Issue: I currently have a mongo collection with 100,000 documents. Each document has 3 fields (_id, name, age). I want to add a 4th field to each document called hashValue that stores the md5 hash value of each documents name field.

I currently can interact with my collection via the mongo shell or via Mongoose ODM as part of a nodeJS app.

Possible Solutions:

  1. Use Mongoose/nodeJs:

I realize this won't work (don't believe you can iterate through a cursor in this manner), but hopefully it shows what I'm trying to do.

var crypto = require('crypto');

    MyCollection.find().forEach(function(el){
        var hash = crypto.createHash('md5').update(el.name).digest("hex");
        el.name = hash;
        el.save()
    });
  1. Use mongo Shell - Almost same as above, and I realize something like the above syntax would work. Only issue is that I don't know how to create the md5 hash in the mongo shell. But I am able to iterate through each document and add a field.

  2. (possible workaround) - The goal of this is to be able to query based off the md5 hash of a name value. I believe mongo allows you to create a hashed index (link here). Only issue is that I can't find an example of anyone using this for querying (only seems to be used for sharding) and I'm not sure if that will work later on. (Example: I want to md5 hash a name I collect from a user, and then query my mongo collection to see if I can find that md5 hash in the hashValue field)

like image 511
user2263572 Avatar asked Dec 02 '22 15:12

user2263572


2 Answers

Javascript already has md5 hash function called hex_md5. Its available in mongo console as well.

> hex_md5('john')
527bd5b5d689e2c32ae974c6229ff785

So to update records in your case you can use the following code snippet in mongo console:

db.collection.find().forEach( function(data){
  data.hashValue = hex_md5(data.name);
  db.collection.save(data);
});
like image 136
Sarath Nair Avatar answered Dec 24 '22 08:12

Sarath Nair


You can iterate through cursor in mongoose using streams and update all the records using bulk.

mongoose.connection.on("open", function(err,conn) {
    var bulk = MyCollection.collection.initializeUnorderedBulkOp();
    MyCollection.find().stream()
        .on('data', function(el){
            var hash = crypto.createHash('md5').update(el.name).digest("hex");
            // add document update operation to a bulk
            bulk.find({'_id': el._id}).update({$set: {name: hash}});
        })
        .on('error', function(err){
            // handle error
        })
        .on('end', function(){
            // execute all bulk operations
            bulk.execute(function (error) {
                // final callback
                callback();                   
            });
        });
    });
like image 28
Volodymyr Synytskyi Avatar answered Dec 24 '22 10:12

Volodymyr Synytskyi