Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java/MongoDB message length error on Windows but not on Linux

we are currently working on importing huge JSON files (~100 MB) into MongoDB using the java driver. Currently we split up the files into smaller chunks, since we first encountered problems with importing the whole file. Of course we are aware of the limitation to MongoDB that the maximum document size is 16 MB, however our chunks that we are now importing are far smaller than that.

Strangely enough, the import procedure is working when running it on Linux (eclipse), yet the same program will throw an exception stating "can't say something" on Windows (eclipse). When observing the log from the database, the error message says

> "Thu Sep 13 11:38:48 [conn1] recv(): message len 1835627538 is too
> large1835627538"

Rerunning the import on the same dataset always leads to the same error message regarding the message length. We investigated the size of our documents to import (using .toString().length()) - the chunk that caused the error was only some kB large.

It makes no difference on which OS the mongo database runs on, but depends on where the import code is being executed (Using the same java-mongo-driver

like image 905
bobeye0816 Avatar asked Nov 13 '22 22:11

bobeye0816


1 Answers

"we are currently working on importing huge JSON files (~100 MB) into MongoDB using the java driver"

Are we talking about a JSON file containing 1000s of JSON objects OR 1 JSON object that is size ~100MB? Because if I remember correctly the 16MB limit is per object not per JSON file containing 1000s of JSON objecs.

Also!

"Thu Sep 13 11:38:48 [conn1] recv(): message len 1835627538 is too
large1835627538" 

the chunk that caused the error was only some kB large.

If 1835627538 is indeed in kb, that is pretty big, cause thats around ~1750 GigaBytes!!

To get round a JSON file containing 1000s of JSON objects, Why don't you iterate through your data file line by line and do your inserts that way? With my method doesn't matter how large your data file is, the iterator is just a pointer to a specific line. It doesn't load the WHOLE FILE into memory and insert.

NOTE: This is assuming your data file contains 1 JSON object per line.

Using the Apache Commons IO FileUtils (click here), you can use their Line iterator to iterate through you file, for example (not fully working code, need to import correct libs):

LineIterator line_iter;
    try {
        line_iter = FileUtils.lineIterator(data_file);      
        while (line_iter.hasNext()) {
            line = line_iter.next();

            try {
                    if (line.charAt(0) == '{') 
                            this.mongodb.insert(line);
            } catch (IndexOutOfBoundsException e) {}
            }
        }
        line_iter.close(); // close the iterator  
    } catch (IOException e) {
        e.printStackTrace();
    }
like image 200
chutsu Avatar answered Nov 15 '22 12:11

chutsu