Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB: mongoimport loses connection when importing big files

Tags:

mongodb

I have some trouble importing a JSON file to a local MongoDB instance. The JSON was generated using mongoexport and looks like this. No arrays, no hardcore nesting:

{"_created":{"$date":"2015-10-20T12:46:25.000Z"},"_etag":"7fab35685eea8d8097656092961d3a9cfe46ffbc","_id":{"$oid":"562637a14e0c9836e0821a5e"},"_updated":{"$date":"2015-10-20T12:46:25.000Z"},"body":"base64 encoded string","sender":"[email protected]","type":"answer"}
{"_created":{"$date":"2015-10-20T12:46:25.000Z"},"_etag":"7fab35685eea8d8097656092961d3a9cfe46ffbc","_id":{"$oid":"562637a14e0c9836e0821a5e"},"_updated":{"$date":"2015-10-20T12:46:25.000Z"},"body":"base64 encoded string","sender":"[email protected]","type":"answer"}

If I import a 9MB file with ~300 rows, there is no problem:

[stekhn latest]$ mongoimport -d mietscraping -c mails mails-small.json 
2015-11-02T10:03:11.353+0100    connected to: localhost
2015-11-02T10:03:11.372+0100    imported 240 documents

But if try to import a 32MB file with ~1300 rows, the import fails:

[stekhn latest]$ mongoimport -d mietscraping -c mails mails.json 
2015-11-02T10:05:25.228+0100    connected to: localhost
2015-11-02T10:05:25.735+0100    error inserting documents: lost connection to server
2015-11-02T10:05:25.735+0100    Failed: lost connection to server
2015-11-02T10:05:25.735+0100    imported 0 documents

Here is the log:

2015-11-02T11:53:04.146+0100 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:45237 #21 (6 connections now open)
2015-11-02T11:53:04.532+0100 I -        [conn21] Assertion: 10334:BSONObj size: 23592351 (0x167FD9F) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "mails"
2015-11-02T11:53:04.536+0100 I NETWORK  [conn21] AssertionException handling request, closing client connection: 10334 BSONObj size: 23592351 (0x167FD9F) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "mails"

I've heard about the 16MB limit for BSON documents before, but since no row in my JSON file is bigger than 16MB, this shouldn't be a problem, right? When I do the exact same (32MB) import one my local computer, everything works fine.

Any ideas what could cause this weird behaviour?

like image 233
stekhn Avatar asked Nov 02 '15 10:11

stekhn


People also ask

Which parameter do you have to use while importing a CSV file into MongoDB using Mongoimport command?

You can use the --collection (or -c ) parameter to specify a collection to import the file into.

Does Mongoimport overwrite?

Default behavior says skip if already exists so by default it wont overwrite existing data. But you can update it using --upsert flag.

What is Mongoimport?

The mongoimport tool imports content from an Extended JSON, CSV, or TSV export created by mongoexport , or potentially, another third-party export tool. Run mongoimport from the system command line, not the mongo shell.


1 Answers

I guess the problem is about performance, any way you can solved used:

you can use mongoimport option -j. Try increment if not work with 4. i.e, 4,8,16, depend of the number of core you have in your cpu.

mongoimport --help

-j, --numInsertionWorkers= number of insert operations to run concurrently (defaults to 1)


mongoimport -d mietscraping -c mails -j 4 < mails.json


or you can split the file and import all files.

I hope this help you.


looking a little more, is a bug in some version https://jira.mongodb.org/browse/TOOLS-939 here another solution you can change the batchSize, for default is 10000, reduce the value and test:

mongoimport -d mietscraping -c mails < mails.json --batchSize 1

like image 98
Rodrigo Avatar answered Sep 30 '22 03:09

Rodrigo