I am just getting familiar with Mongodb, which is why I did something stupid. Each of my dataset's entries include a timestamp (they're Tweets). Instead of converting the timestamp from a string to an actual date format before inserting, I inserted it simply as a string.
Now, my dataset is becoming huge (3+ million Tweets), and I want to begin sorting/ranging my entries. Since my timestamp is still a string ("Wed Apr 29 09:52:22 +0000 2015"), I want to convert this to a date format.
I found the following code in this answer: How do I convert a property in MongoDB from text to date type?
> var cursor = db.ClockTime.find()
> while (cursor.hasNext()) {
... var doc = cursor.next();
... db.ClockTime.update({_id : doc._id}, {$set : {ClockInTime : new Date(doc.ClockInTime)}})
... }
And it works great. However, it is incredibly slow. According to the MongoHub app, it only processes 4 queries per second. With a dataset of 3+ million tweets, this will take approximately 8.6 days to convert. I really hope there is a way to speed this up, as my deadline is in 8 days :P
Any thoughts?
Another option would be to use bulk operations, which are extremely fast, especially the unordered variant, since they can be applied in parallel.
var bulk = db.ClockTime.initializeUnorderedBulkOp()
var myDocs = db.ClockTime.find()
var ops = 0
myDocs.forEach(
function(myDoc) {
bulk.find(
{_id:myDoc._id}
).updateOne(
{$set : { ClockInTime: new Date(myDoc.ClockInTime) } }
);
if ( (++ops % 10000) === 0){
bulk.execute();
bulk = db.ClockTime.initializeUnorderedBulkOp();
}
}
)
bulk.execute()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With