Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongoimport stuck at same point while importing a JSON file

OS: ubuntu 14.04 LTS
Mongo version: rs0:PRIMARY> db.version()
    3.0.7
Storage engine: wiredTiger.

Importing a JSON file (13GB, 1 JSON doc per line) via this command:

$ mongoimport --db InfDB --collection SFTest --file InfMapRed.json

This command used to work fine in 2.6 with the prior storage engine but now does not progress beyond 0.2%. The program keeps printing the line below over and over. The collection shows 1000 records via .count().

    2015-10-24T06:11:41.799+0000    connected to: localhost
    2015-10-24T06:11:44.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:11:47.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:11:50.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:11:53.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:11:56.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:11:59.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:02.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:05.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:08.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:11.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:14.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:17.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:20.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:23.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
....

Wrote a simple python script to read the file and insert the docs line wise and that works fine.

like image 872
Vishal Avatar asked Oct 24 '15 06:10

Vishal


3 Answers

Using a smaller batch size solved this.

mongoimport --db InfDB --collection SFTest --file InfMapRed.json --batchSize 100

This is useful when importing large docs, default batch size is 10000.

like image 174
Vishal Avatar answered Nov 09 '22 12:11

Vishal


I've had this issue with large json files, batchSize did not fix the issue, but numInsertionWorkers did

this works for mongo 3 only:

in your case, with 1 worker, you were able to insert 0.2% of the data, so 100/0.2=500, you need 500 workers to get the data at once

mongoimport --db InfDB --collection SFTest --file InfMapRed.json --numInsertionWorkers 500       

reference: https://docs.mongodb.com/manual/reference/program/mongoimport/#cmdoption--numInsertionWorkers

like image 27
simona Avatar answered Nov 09 '22 11:11

simona


Check the secondaries. When I had this problem in v3.0.8, the secondaries were stuck in the RECOVERING state, and the logs showed why:

2015-11-19T00:35:01.271+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:42:16.360+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:45:01.410+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:52:16.496+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:55:01.551+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up

By default, mongoimport operates with "majority" write concern. Since my secondaries were stale, they couldn't replicate the import operations, and the primary was waiting around for replication that could never occur.

After performing a manual resync on the secondaries, I attempted a mongoimport again with success. Alternatively, if only one of your secondaries is RECOVERING, you could set the write concern to a low number with the --writeConcern option.

like image 26
ggallo Avatar answered Nov 09 '22 10:11

ggallo