Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb slow update loop

Tags:

mongodb

I am just getting familiar with Mongodb, which is why I did something stupid. Each of my dataset's entries include a timestamp (they're Tweets). Instead of converting the timestamp from a string to an actual date format before inserting, I inserted it simply as a string.

Now, my dataset is becoming huge (3+ million Tweets), and I want to begin sorting/ranging my entries. Since my timestamp is still a string ("Wed Apr 29 09:52:22 +0000 2015"), I want to convert this to a date format.

I found the following code in this answer: How do I convert a property in MongoDB from text to date type?

> var cursor = db.ClockTime.find()
> while (cursor.hasNext()) {
... var doc = cursor.next();
... db.ClockTime.update({_id : doc._id}, {$set : {ClockInTime : new Date(doc.ClockInTime)}})
... }

And it works great. However, it is incredibly slow. According to the MongoHub app, it only processes 4 queries per second. With a dataset of 3+ million tweets, this will take approximately 8.6 days to convert. I really hope there is a way to speed this up, as my deadline is in 8 days :P

Any thoughts?

like image 405
Diederik Avatar asked Dec 05 '22 03:12

Diederik


1 Answers

Another option would be to use bulk operations, which are extremely fast, especially the unordered variant, since they can be applied in parallel.

var bulk = db.ClockTime.initializeUnorderedBulkOp()
var myDocs = db.ClockTime.find()
var ops = 0

myDocs.forEach(
  function(myDoc) {
    bulk.find(
      {_id:myDoc._id}
    ).updateOne(
        {$set : { ClockInTime: new Date(myDoc.ClockInTime) } }
    );

    if ( (++ops % 10000) === 0){
      bulk.execute();
      bulk = db.ClockTime.initializeUnorderedBulkOp();
    }
  }
)

bulk.execute()
like image 185
Markus W Mahlberg Avatar answered Dec 30 '22 18:12

Markus W Mahlberg