Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongoimport 3.0 is slower than 2.6.4

To my surprise, I found that importing the same file to the same MongoDB (3.0) is much slower (> 20 times) using 3.0 version vs 2.6.4. Does anyone have the same problems? And how to fix it?

Here are the details:

  1. 2.6.4 loads around 16K rows for the same json file

    **-logbash-3.2$ mongoimport --host mcp-mongo-dev-1201.sea2.rhapsody.com:27017 --db media 
            --collection media --upsert --upsertFields _id --type json --file /data/xxx.json
    

    connected to: mcp-mongo-dev-1201.sea2.rhapsody.com:27017
    2015-10-08T15:24:02.007-0700            Progress: 8860712/5024041951    0%
    2015-10-08T15:24:02.007-0700                    54900   18300/second
    2015-10-08T15:24:05.004-0700            Progress: 15590853/5024041951   0%
    2015-10-08T15:24:05.004-0700                    96900   16150/second**
    
  2. Here is the 3.0 run:

    -logbash-3.2$ mongoimport30 --version
    
    mongoimport version: 3.0.6
    git version: 7588eb887549bd5d2fc7bbc08f7c62d4b29b9d75
    
    -logbash-3.2$ mongoimport30 --host mcp-mongo-dev-1201.sea2.rhapsody.com:27017 --db media 
          --collection media --upsertFields _id --type json --file /data/mediaingestor2.json  --numInsertionWorkers 20000 -v
    

    2015-10-08T15:53:04.393-0700    using upsert fields: [_id]
    2015-10-08T15:53:04.393-0700    filesize: 5024041951 bytes
    2015-10-08T15:53:04.393-0700    using fields: 
    2015-10-08T15:53:04.396-0700    connected to: mcp-mongo-dev-1201.sea2.rhapsody.com:27017
    2015-10-08T15:53:04.396-0700    ns: media.media
    2015-10-08T15:53:04.396-0700    connected to node type: replset
    2015-10-08T15:53:04.397-0700    using write concern: w='majority', j=false, fsync=false, wtimeout=0
    2015-10-08T15:53:04.397-0700    using write concern: w='majority', j=false, fsync=false, wtimeout=0
    2015-10-08T15:53:07.393-0700    [........................] media.media  1.5 MB/4.7 GB (0.0%)
    2015-10-08T15:53:10.393-0700    [........................] media.media  1.5 MB/4.7 GB (0.0%)
    2015-10-08T15:53:13.393-0700    [........................] media.media  1.5 MB/4.7 GB (0.0%)
    2015-10-08T15:53:16.393-0700    [........................] media.media  1.5 MB/4.7 GB (0.0%)
    2015-10-08T15:53:19.393-0700    [........................] media.media  1.5 MB/4.7 GB (0.0%)
    

On the MongoDB side, I use mongostat to see that the number of updates were around 400, which is much smaller than ~16K from the 2.6.4 version above. Note that I also tried --numInsertionWorkers 20000 which is supposed to make it faster but it seems to be the same as without using this option at all. Maybe the git version I am using is not the good one?

like image 305
user3123912 Avatar asked Oct 08 '15 22:10

user3123912


1 Answers

Running mongoimport with 20,000 numInsertionWorkers is excessive. The application may be loosing performance due to a lot context switching in support of so many threads. The right number of workers is going to be closer to the number of cores on the machine that you're running mongoimport on. You can find the right number through testing, Start with a single worker, monitor the performance, and then double the number in each successive test [1,2,4,8,16,...]. You'll eventually find a number at which performance no longer improves. At that point you will have exceeded the right number of workers.

When comparing performance between versions or processes it is important to make sure the conditions between the test runs haven't changed. It will be difficult to get a meaningful comparison between the two processes if the either the servers or the network has changed from test to test.

Check that the database itself is an identical state. For instance, there will be performance differences if your import workload is run against a database that has data and preexisting indexes and a database that is empty.

Check that the file system and OS configurations are set properly. Our documentation lists a set of system configurations you should set for best performance. http://docs.mongodb.org/manual/administration/production-notes/

Check that the server on which you are running mongoimport is not saturated. Look for any competing processes which may be consuming resources such as CPU, memory and network bandwidth in competition with mongoimport. Similarly, check the server on which you are running mongod for competing processes which contend for server resources.

Check the number of queued readers and writers in mongostat, a low number of queued operations in mongostat can indicate that the mongoimport process is the bottleneck. I suspect that the mongoimport process is bottlenecking upstream of the database.

like image 137
blimpyacht Avatar answered Sep 30 '22 13:09

blimpyacht