Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to import millions of JSON documents to MongoDB

I have more than 10 million JSON documents of the form :

["key": "val2", "key1" : "val", "{\"key\":\"val", \"key2\":\"val2"}"]

in one file.

Importing using JAVA Driver API took around 3 hours, while using the following function (importing one BSON at a time):

public static void importJSONFileToDBUsingJavaDriver(String pathToFile, DB db, String collectionName) {
    // open file
    FileInputStream fstream = null;
    try {
        fstream = new FileInputStream(pathToFile);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
        System.out.println("file not exist, exiting");
        return;
    }
    BufferedReader br = new BufferedReader(new InputStreamReader(fstream));

    // read it line by line
    String strLine;
    DBCollection newColl =   db.getCollection(collectionName);
    try {
        while ((strLine = br.readLine()) != null) {
            // convert line by line to BSON
            DBObject bson = (DBObject) JSON.parse(JSONstr);
            // insert BSONs to database
            try {
                newColl.insert(bson);
            }
            catch (MongoException e) {
              // duplicate key
              e.printStackTrace();
            }


        }
        br.close();
    } catch (IOException e) {
        e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
    }


}

Is there a faster way? Maybe, MongoDB settings may influence the insertion speed? (for, example adding key : "_id" which will function as index, so that MongoDB would not have to create artificial key and thus index for each document) or disable index creation at all at insertion. Thanks.

like image 641
rok Avatar asked Oct 28 '13 15:10

rok


People also ask

How do I import a JSON file into MongoDB?

To import JSON file you need to follow the following steps: Step 1: Open a command prompt and give command mongod to connect with MongoDB server and don't close this cmd to stay connected to the server. Step 2: Open another command prompt and run the mongo shell. Using the mongo command.

How do I import multiple JSON files into MongoDB compass?

You just copy and paste this code in the cmd and change the file directories C:\MongoDB\Server\3.0\bin and C:\test\ . Show activity on this post. You will need to write a script in your favourite language that reads each file, JSON-decodes it and then inserts them one by one into MongoDB.

How import large data in MongoDB?

You have two options. One is to write a script to restructure the data before using mongoimport to import the data. Another approach could be to import the data into MongoDB and then run an aggregation pipeline to transform the data into your required structure.

Is MongoDB good for JSON?

The best database for JSON A JSON database like MongoDB stores the data in a JSON-like format (binary JSON), which is the binary encoded version of JSON, and is optimized for performance and space. This makes the MongoDB database the best natural fit for storing JSON data.


2 Answers

I'm sorry but you're all picking minor performance issues instead of the core one. Separating the logic from reading the file and inserting is a small gain. Loading the file in binary mode (via MMAP) is a small gain. Using mongo's bulk inserts is a big gain, but still no dice.

The whole performance bottleneck is the BSON bson = JSON.parse(line). Or in other words, the problem with the Java drivers is that they need a conversion from json to bson, and this code seems to be awfully slow or badly implemented. A full JSON (encode+decode) via JSON-simple or specially via JSON-smart is 100 times faster than the JSON.parse() command.

I know Stack Overflow is telling me right above this box that I should be answering the answer, which I'm not, but rest assured that I'm still looking for an answer for this problem. I can't believe all the talk about Mongo's performance and then this simple example code fails so miserably.

like image 153
Bruno D. Rodrigues Avatar answered Sep 20 '22 03:09

Bruno D. Rodrigues


You can also remove all the indexes (except for the PK index, of course) and rebuild them after the import.

like image 40
evanchooly Avatar answered Sep 18 '22 03:09

evanchooly