I have a large JSON file (2.5MB) containing about 80000 lines.
It looks like this:
{
"a": 123,
"b": 0.26,
"c": [HUGE irrelevant object],
"d": 32
}
I only want the integer values stored for keys a
, b
and d
and ignore the rest of the JSON (i.e. ignore whatever is there in the c
value).
I cannot modify the original JSON as it is created by a 3rd party service, which I download from its server.
How do I do this without loading the entire file in memory?
I tried using gson library and created the bean like this:
public class MyJsonBean {
@SerializedName("a")
@Expose
public Integer a;
@SerializedName("b")
@Expose
public Double b;
@SerializedName("d")
@Expose
public Integer d;
}
but even then in order to deserialize it using Gson, I need to download + read the whole file in memory first and the pass it as a string to Gson?
File myFile = new File(<FILENAME>);
myFile.createNewFile();
URL url = new URL(<URL>);
OutputStream out = new BufferedOutputStream(new FileOutputStream(myFile));
URLConnection conn = url.openConnection();
HttpURLConnection httpConn = (HttpURLConnection) conn;
InputStream in = conn.getInputStream();
byte[] buffer = new byte[1024];
int numRead;
while ((numRead = in.read(buffer)) != -1) {
out.write(buffer, 0, numRead);
}
FileInputStream fis = new FileInputStream(myFile);
byte[] data = new byte[(int) myFile.length()];
fis.read(data);
String str = new String(data, "UTF-8");
Gson gson = new Gson();
MyJsonBean response = gson.fromJson(str, MyJsonBean.class);
System.out.println("a: " + response.a + "" + response.b + "" + response.d);
Is there any way to avoid loading the whole file and just get the relevant values that I need?
You should definitely check different approaches and libraries. If you are really take care about performance check: Gson
, Jackson
and JsonPath
libraries to do that and choose the fastest one. Definitely you have to load the whole JSON
file on local disk, probably TMP
folder and parse it after that.
Simple JsonPath
solution could look like below:
import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;
import java.io.File;
public class JsonPathApp {
public static void main(String[] args) throws Exception {
File jsonFile = new File("./resource/test.json").getAbsoluteFile();
DocumentContext documentContext = JsonPath.parse(jsonFile);
System.out.println("" + documentContext.read("$.a"));
System.out.println("" + documentContext.read("$.b"));
System.out.println("" + documentContext.read("$.d"));
}
}
Notice, that I do not create any POJO
, just read given values using JSONPath
feature similarly to XPath
. The same you can do with Jackson
:
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
public class JsonPathApp {
public static void main(String[] args) throws Exception {
File jsonFile = new File("./resource/test.json").getAbsoluteFile();
ObjectMapper mapper = new ObjectMapper();
JsonNode root = mapper.readTree(jsonFile);
System.out.println(root.get("a"));
System.out.println(root.get("b"));
System.out.println(root.get("d"));
}
}
We do not need JSONPath
because values we need are directly in root
node. As you can see, API
looks almost the same. We can also create POJO
structure:
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
import java.math.BigDecimal;
public class JsonPathApp {
public static void main(String[] args) throws Exception {
File jsonFile = new File("./resource/test.json").getAbsoluteFile();
ObjectMapper mapper = new ObjectMapper();
Pojo pojo = mapper.readValue(jsonFile, Pojo.class);
System.out.println(pojo);
}
}
@JsonIgnoreProperties(ignoreUnknown = true)
class Pojo {
private Integer a;
private BigDecimal b;
private Integer d;
// getters, setters
}
Even so, both libraries allow to read JSON
payload directly from URL
I suggest to download it in another step using best approach you can find. For more info, read this article: Download a File From an URL in Java.
There are some excellent libraries for parsing large JSON files with minimal resources. One is the popular GSON library. It gets at the same effect of parsing the file as both stream and object. It handles each record as it passes, then discards the stream, keeping memory usage low.
If you’re interested in using the GSON approach, there’s a great tutorial for that here. Detailed Tutorial
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With