OS: ubuntu 14.04 LTS
Mongo version: rs0:PRIMARY> db.version()
3.0.7
Storage engine: wiredTiger.
Importing a JSON file (13GB, 1 JSON doc per line) via this command:
$ mongoimport --db InfDB --collection SFTest --file InfMapRed.json
This command used to work fine in 2.6 with the prior storage engine but now does not progress beyond 0.2%. The program keeps printing the line below over and over. The collection shows 1000 records via .count().
2015-10-24T06:11:41.799+0000 connected to: localhost
2015-10-24T06:11:44.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:11:47.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:11:50.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:11:53.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:11:56.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:11:59.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:12:02.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:12:05.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:12:08.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:12:11.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:12:14.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:12:17.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:12:20.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
2015-10-24T06:12:23.788+0000 [........................] InfDB.SFTest 20.5 MB/13.0 GB (0.2%)
....
Wrote a simple python script to read the file and insert the docs line wise and that works fine.
Using a smaller batch size solved this.
mongoimport --db InfDB --collection SFTest --file InfMapRed.json --batchSize 100
This is useful when importing large docs, default batch size is 10000.
I've had this issue with large json files, batchSize did not fix the issue, but numInsertionWorkers did
this works for mongo 3 only:
in your case, with 1 worker, you were able to insert 0.2% of the data, so 100/0.2=500, you need 500 workers to get the data at once
mongoimport --db InfDB --collection SFTest --file InfMapRed.json --numInsertionWorkers 500
reference: https://docs.mongodb.com/manual/reference/program/mongoimport/#cmdoption--numInsertionWorkers
Check the secondaries. When I had this problem in v3.0.8, the secondaries were stuck in the RECOVERING state, and the logs showed why:
2015-11-19T00:35:01.271+0000 I REPL [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:42:16.360+0000 I REPL [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:45:01.410+0000 I REPL [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:52:16.496+0000 I REPL [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:55:01.551+0000 I REPL [rsBackgroundSync] replSet error RS102 too stale to catch up
By default, mongoimport
operates with "majority" write concern. Since my secondaries were stale, they couldn't replicate the import operations, and the primary was waiting around for replication that could never occur.
After performing a manual resync on the secondaries, I attempted a mongoimport
again with success. Alternatively, if only one of your secondaries is RECOVERING, you could set the write concern to a low number with the --writeConcern
option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With