Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to parse a huge JSON file without loading it in memory

Tags:

java

json

gson

I have a large JSON file (2.5MB) containing about 80000 lines.

It looks like this:

{
  "a": 123,
  "b": 0.26,
  "c": [HUGE irrelevant object],
  "d": 32
}

I only want the integer values stored for keys a, b and d and ignore the rest of the JSON (i.e. ignore whatever is there in the c value).

I cannot modify the original JSON as it is created by a 3rd party service, which I download from its server.

How do I do this without loading the entire file in memory?

I tried using gson library and created the bean like this:

public class MyJsonBean {
  @SerializedName("a")
  @Expose
  public Integer a;

  @SerializedName("b")
  @Expose
  public Double b;

  @SerializedName("d")
  @Expose
  public Integer d;
}

but even then in order to deserialize it using Gson, I need to download + read the whole file in memory first and the pass it as a string to Gson?

File myFile = new File(<FILENAME>);
myFile.createNewFile();

URL url = new URL(<URL>);
OutputStream out = new BufferedOutputStream(new FileOutputStream(myFile));
URLConnection conn = url.openConnection();

HttpURLConnection httpConn = (HttpURLConnection) conn;

InputStream in = conn.getInputStream();
byte[] buffer = new byte[1024];

int numRead;
while ((numRead = in.read(buffer)) != -1) {
  out.write(buffer, 0, numRead);
}

FileInputStream fis = new FileInputStream(myFile);
byte[] data = new byte[(int) myFile.length()];
fis.read(data);
String str = new String(data, "UTF-8");

Gson gson = new Gson();
MyJsonBean response = gson.fromJson(str, MyJsonBean.class);

System.out.println("a: " + response.a + "" + response.b + "" + response.d);

Is there any way to avoid loading the whole file and just get the relevant values that I need?

like image 549
Sumit Avatar asked Mar 04 '23 11:03

Sumit


2 Answers

You should definitely check different approaches and libraries. If you are really take care about performance check: Gson, Jackson and JsonPath libraries to do that and choose the fastest one. Definitely you have to load the whole JSON file on local disk, probably TMP folder and parse it after that.

Simple JsonPath solution could look like below:

import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;

import java.io.File;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        DocumentContext documentContext = JsonPath.parse(jsonFile);
        System.out.println("" + documentContext.read("$.a"));
        System.out.println("" + documentContext.read("$.b"));
        System.out.println("" + documentContext.read("$.d"));
    }
}

Notice, that I do not create any POJO, just read given values using JSONPath feature similarly to XPath. The same you can do with Jackson:

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.File;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        ObjectMapper mapper = new ObjectMapper();
        JsonNode root = mapper.readTree(jsonFile);
        System.out.println(root.get("a"));
        System.out.println(root.get("b"));
        System.out.println(root.get("d"));
    }
}

We do not need JSONPath because values we need are directly in root node. As you can see, API looks almost the same. We can also create POJO structure:

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.File;
import java.math.BigDecimal;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        ObjectMapper mapper = new ObjectMapper();
        Pojo pojo = mapper.readValue(jsonFile, Pojo.class);
        System.out.println(pojo);
    }
}

@JsonIgnoreProperties(ignoreUnknown = true)
class Pojo {
    private Integer a;
    private BigDecimal b;
    private Integer d;

    // getters, setters
}

Even so, both libraries allow to read JSON payload directly from URL I suggest to download it in another step using best approach you can find. For more info, read this article: Download a File From an URL in Java.

like image 189
Michał Ziober Avatar answered Mar 24 '23 16:03

Michał Ziober


There are some excellent libraries for parsing large JSON files with minimal resources. One is the popular GSON library. It gets at the same effect of parsing the file as both stream and object. It handles each record as it passes, then discards the stream, keeping memory usage low.

If you’re interested in using the GSON approach, there’s a great tutorial for that here. Detailed Tutorial

like image 31
Zayn Korai Avatar answered Mar 24 '23 14:03

Zayn Korai