How to read big json?

Tags:

I receive json-files with data to be analyzed in R, for which I use the RJSONIO-package:

library(RJSONIO)
filename <- "Indata.json"
jFile <- fromJSON(filename)

When the json-files are larger than about 300MB (uncompressed), my computer starts to use the swap memory and continues the parsing (fromJSON) for hours. A 200MB-file takes only about one minute to parse.

I use R 2.14 (64bit) on Ubuntu 64bit with 16GB RAM, so I'm surprised that swapping is needed already at about 300MB of json.

What can I do to read big jsons? Is there something in the memory-settings that mess things up? I have restarted R and run only the three lines above. The json-file contain 2-3 columns with short strings, and 10-20 columns with numbers from 0 to 1000000. I.e. it is the number of rows that makes the large size (more than a million rows in the parsed data).

Update: From the comments I learned that rjson is done more in C, so I tried it. A 300MB file that with RJSONIO (according to Ubuntu System Monitor) reached 100% memory use (from 6% baseline) and went on to swapping, needed only 60% memory with package rjson and the parsing was done in reasonable time (minutes).

629

asked Nov 21 '11 18:11

Chris

2 Answers

Although your question doesn't specify this detail, you may want to make sure that loading the entire JSON in memory is actually what you want. It looks like RJSONIO is a DOM-based API.

What computation do you need to do? Can you use a streaming parser? An example of a SAX-like streaming parser for JSON is yajl.

answered Sep 30 '22 00:09

Will Bradley

Even though the question is very old, this might be of use for someone with a similar problem.

The function jsonlite::stream_in() allows to define pagesize to set the number of lines read at a time, and a custom function that is applied to this subset in each iteration can be provided as handler. This allows working with very large JSON-files without reading everything into memory at the same time.

stream_in(con, pagesize = 5000, handler = function(x){
    # Do something with the data here
})

answered Sep 30 '22 00:09

tobiasegli_te

Related questions
                            
                                Passing a C++ function to a javascript function in emscripten
                            
                                Can I build a bidirectional coroutine with Boost 1.55?
                            
                                WaitForMultipleObjects alternative with std::thread?
                            
                                Program runs slower when launched outside of Visual Studio
                            
                                Moving objects from one unordered_map to another container
                            
                                GCC pure/const functions that accept a pointer argument
                            
                                How to save an image in Intel RealSense(Visual C++)
                            
                                How do you load a scene while animating a sprite in cocos2d-x?
                            
                                What is the rule that allows `this->` to access members of dependent base classes?
                            
                                Why does taking a member function pointer value requires class name qualification even from inside of the class?
                            
                                Detecting matching bits in C++
                            
                                Confusion with hard error in SFINAE
                            
                                Is it legal to use a const value captured in a lambda as a template argument?
                            
                                Creating OpenGL structures in a multithreaded program?
                            
                                Is creating a pointer one past the end of a non-array pointer not derived from unary operator & undefined behavior in C++17?
                            
                                How to static cast throwing function pointer to noexcept in C++17?
                            
                                Exception Specification
                            
                                Difference in inlining functions by compiler or linker?
                            
                                Nesting unnamed namespaces
                            
                                Can a lambda have extern "C" linkage?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With