Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve performance of update() and save() in MongoDB?

I'm looking for tips on how to improve the database performance in the following situation.

As a sample application, I wrote a fairly simple app today that uses the Twitter streaming API to search for certain keywords, then I am storing the results in MongoDB. The app is written with Node.js.

I'm storing 2 collections. One stores the keyword and an array of tweet id's that reference each tweet found mentioning that keyword. These are being added to the database using .update() with {upsert:true} so that new id's are appended to the 'ids' array.

A sample document from this collection looks like this:

{ "_id": ObjectId("4e00645ef58a7ad3fc9fd9f9"), "ids": ["id1","id2","id3"], "keyword": "#chocolate" }

Update code:

 keywords.update({keyword: key_word},{$push:{ids: id}},{upsert:true}, function(err){})

The 2nd collection looks like this and are added simply by using .save()

 {
     "twt_id": "id1",
     "tweet": { //big chunk of json that doesn't need to be shown }
 }

I've got this running on my Macbook right now and its been going for about 2 hours. I'm storing a lot of data, probably several hundred documents per minute. Right now the number of objects in Mongodb is 120k+.

What I'm noticing is that the cpu usage for the database process is hitting as high as 84% and has been constantly going up gradually since I started the latest test run.

I was reading up on setting indexes, but since I'm adding documents and not running queries against them, I'm not sure if indexes will help. A side thought that occurred to me is that update() might be doing a lookup since I'm using $push and that an index might help with that.

What should I be looking at to keep MongoDB from eating up ever increasing amounts of CPU?

like image 334
Geuis Avatar asked Jun 21 '11 10:06

Geuis


People also ask

What is the improvement in performance in MongoDB?

Other ways to improve MongoDB performance after identifying your major query patterns include: Storing the results of frequent sub-queries on documents to reduce read load. Making sure that you have indices on any fields you regularly query against. Looking at your logs to identify slow queries, then check your indices.

What is the difference between update and save in MongoDB?

MongoDB's update() and save() methods are used to update document into a collection. The update() method updates the values in the existing document while the save() method replaces the existing document with the document passed in save() method.

What method can we use to maximize performance and prevent MongoDB from returning more results than required for processing?

Use limit() to maximize performance and prevent MongoDB from returning more results than required for processing.


1 Answers

It is very likely that you are hitting a very common bottle neck in MongoDB. Since you are updating documents very frequently by adding strings, there is a good chance that you are running out of space for that document and forcing the database to constantly move that document to a different space in memory\disk by rewriting it at the tail end of the data file.

Adding indexes can only hurt write performance so that will not help improve performance unless you are read heavy.

I would consider changing your application logic to do this:

  1. Index on the keyword field
  2. Before inserting anything into the database each time you detect a tweet, query for the document which contains the keyword. If it does not exist, insert a new document but pad the ids property by adding a whole bunch of fake strings in the array. Then immediately after inserting it, remove all of the id's from that array. This will cause mongodb to allocate additional room for that entire document so that when you start adding id's to the ids field, it will have plenty of room to grow.
  3. Insert the id of the tweet into the ids field
like image 116
Bryan Migliorisi Avatar answered Oct 07 '22 04:10

Bryan Migliorisi