Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read and parse a >400MB .json file in Julia without crashing kernel

Tags:

json

julia

The following is crashing my Julia kernel. Is there a better way to read and parse a large (>400 MB) JSON file?

using JSON
data = JSON.parsefile("file.json") 
like image 971
Ian Avatar asked Dec 26 '15 17:12

Ian


1 Answers

Unless some effort is invested into making a smarter JSON parser, the following might work: There is a good chance file.json has many lines. In this case, reading the file and parsing a big repetitive JSON section line-by-line or chunk-by-chuck (for the right chunk length) could do the trick. A possible way to code this, would be:

using JSON
f = open("file.json","r")

discard_lines = 12      # lines up to repetitive part
important_chunks = 1000 # number of data items
chunk_length = 2        # each data item has a 2-line JSON chunk

for i=1:discard_lines
    l = readline(f)
end
for i=1:important_chunks
    chunk = join([readline(f) for j=1:chunk_length])
    push!(thedata,JSON.parse(chunk))
end
close(f)
# use thedata

There is a good chance this could be a temporary stopgap solution for your problem. Inspect file.json to find out.

like image 162
Dan Getz Avatar answered Nov 01 '22 22:11

Dan Getz