Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk loading MongoDB from JSON file with a number of objects

I want to do a bulk load into MongoDB. I have about 200GB of files containing JSON objects which I want to load, the problem is I cannot use the mongoimport tool as the objects contain objects (i.e. I'd need to use the --jsonArray aaram) which is limited to 4MB.

There is the Bulk Load API in CouchDB where I can just write a script and use cURL to send a POST request to insert the documents, no size limits...

Is there anything like this in MongoDB? I know there is Sleepy but I am wondering if this can cope with a JSON nest array insert..?

Thanks!

like image 692
NightWolf Avatar asked Jul 01 '11 12:07

NightWolf


People also ask

How do I import a JSON file into MongoDB collection?

Open the Import Wizard. Then, choose JSON as the import format and click OK. Click on + to add JSON source documents, – to remove them, or the clipboard icon to paste JSON data from the clipboard. Here we will add the JSON source document, Rainfall-Data.

Can we import JSON in MongoDB?

The process to import JSON into MongoDB depends on the operating system and the programming language you are using. However, the key to importing is to access the MongoDB database and parsing the file that you want to import. You can then go through each document sequentially and insert into MongoDB.


1 Answers

Ok, basically appears there is no real good answer unless I write my own tool in something like Java or Ruby to pass the objects in (meh effort)... But that's a real pain so instead I decided to simply split the files down to 4MB chunks... Just wrote a simple shell script using split (note that I had to split the files multiple times because of the limitations). I used the split command with -l (line numbers) so each file had x number of lines in it. In my case each Json object was about 4kb so I just guessed line sizes.

For anyone wanting to do this remember that split can only make 676 files (26*26) so you need to make sure each file has enough lines in it to avoid missing half the files. Any way put all this in a good old bash script and used mongo import and let it run overnight. Easiest solution IMO and no need to cut and mash files and parse JSON in Ruby/Java or w.e. else.

The scripts are a bit custom, but if anyone wants them just leave a comment and ill post.

like image 127
NightWolf Avatar answered Sep 19 '22 10:09

NightWolf