To my surprise, I found that importing the same file to the same MongoDB (3.0) is much slower (> 20 times) using 3.0 version vs 2.6.4. Does anyone have the same problems? And how to fix it?
Here are the details:
2.6.4 loads around 16K rows for the same json file
**-logbash-3.2$ mongoimport --host mcp-mongo-dev-1201.sea2.rhapsody.com:27017 --db media
--collection media --upsert --upsertFields _id --type json --file /data/xxx.json
connected to: mcp-mongo-dev-1201.sea2.rhapsody.com:27017
2015-10-08T15:24:02.007-0700 Progress: 8860712/5024041951 0%
2015-10-08T15:24:02.007-0700 54900 18300/second
2015-10-08T15:24:05.004-0700 Progress: 15590853/5024041951 0%
2015-10-08T15:24:05.004-0700 96900 16150/second**
Here is the 3.0 run:
-logbash-3.2$ mongoimport30 --version
mongoimport version: 3.0.6
git version: 7588eb887549bd5d2fc7bbc08f7c62d4b29b9d75
-logbash-3.2$ mongoimport30 --host mcp-mongo-dev-1201.sea2.rhapsody.com:27017 --db media
--collection media --upsertFields _id --type json --file /data/mediaingestor2.json --numInsertionWorkers 20000 -v
2015-10-08T15:53:04.393-0700 using upsert fields: [_id]
2015-10-08T15:53:04.393-0700 filesize: 5024041951 bytes
2015-10-08T15:53:04.393-0700 using fields:
2015-10-08T15:53:04.396-0700 connected to: mcp-mongo-dev-1201.sea2.rhapsody.com:27017
2015-10-08T15:53:04.396-0700 ns: media.media
2015-10-08T15:53:04.396-0700 connected to node type: replset
2015-10-08T15:53:04.397-0700 using write concern: w='majority', j=false, fsync=false, wtimeout=0
2015-10-08T15:53:04.397-0700 using write concern: w='majority', j=false, fsync=false, wtimeout=0
2015-10-08T15:53:07.393-0700 [........................] media.media 1.5 MB/4.7 GB (0.0%)
2015-10-08T15:53:10.393-0700 [........................] media.media 1.5 MB/4.7 GB (0.0%)
2015-10-08T15:53:13.393-0700 [........................] media.media 1.5 MB/4.7 GB (0.0%)
2015-10-08T15:53:16.393-0700 [........................] media.media 1.5 MB/4.7 GB (0.0%)
2015-10-08T15:53:19.393-0700 [........................] media.media 1.5 MB/4.7 GB (0.0%)
On the MongoDB side, I use mongostat
to see that the number of updates were around 400, which is much smaller than ~16K from the 2.6.4 version above. Note that I also tried --numInsertionWorkers 20000
which is supposed to make it faster but it seems to be the same as without using this option at all. Maybe the git version I am using is not the good one?
Running mongoimport with 20,000 numInsertionWorkers is excessive. The application may be loosing performance due to a lot context switching in support of so many threads. The right number of workers is going to be closer to the number of cores on the machine that you're running mongoimport on. You can find the right number through testing, Start with a single worker, monitor the performance, and then double the number in each successive test [1,2,4,8,16,...]. You'll eventually find a number at which performance no longer improves. At that point you will have exceeded the right number of workers.
When comparing performance between versions or processes it is important to make sure the conditions between the test runs haven't changed. It will be difficult to get a meaningful comparison between the two processes if the either the servers or the network has changed from test to test.
Check that the database itself is an identical state. For instance, there will be performance differences if your import workload is run against a database that has data and preexisting indexes and a database that is empty.
Check that the file system and OS configurations are set properly. Our documentation lists a set of system configurations you should set for best performance. http://docs.mongodb.org/manual/administration/production-notes/
Check that the server on which you are running mongoimport is not saturated. Look for any competing processes which may be consuming resources such as CPU, memory and network bandwidth in competition with mongoimport. Similarly, check the server on which you are running mongod for competing processes which contend for server resources.
Check the number of queued readers and writers in mongostat, a low number of queued operations in mongostat can indicate that the mongoimport process is the bottleneck. I suspect that the mongoimport process is bottlenecking upstream of the database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With