Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I can I lazily read multiple JSON values from a file/stream in Rust?

I'd like to read multiple JSON objects from a file/reader in Rust, one at a time. Unfortunately serde_json::from_reader(...) just reads until end-of-file; there doesn't seem to be any way to use it to read a single object or to lazily iterate over the objects.

Is there any way to do this? Using serde_json would be ideal, but if there's a different library I'd be willing use that instead.

At the moment I'm putting each object on a separate line and parsing them individually, but I would really prefer not to need to do this.

Example Use

main.rs

use serde_json;

fn main() -> Result<(), Box<dyn std::error::Error>> {
   let stdin = std::io::stdin();
   let stdin = stdin.lock();

   for item in serde_json::iter_from_reader(stdin) {
     println!("Got {:?}", item);
   }

   Ok(())
}

in.txt

{"foo": ["bar", "baz"]} 1 2 [] 4 5 6

example session

Got Object({"foo": Array([String("bar"), String("baz")])})
Got Number(1)
Got Number(2)
Got Array([])
Got Number(4)
Got Number(5)
Got Number(6)
like image 221
Jeremy Avatar asked Apr 22 '19 16:04

Jeremy


1 Answers

This was a pain when I wanted to do it in Python, but fortunately in Rust this is a directly-supported feature of the de-facto-standard serde_json crate! It isn't exposed as a single convenience function, but we just need to create a serde_json::Deserializer reading from our file/reader, then use its .into_iter() method to get a StreamDeserializer iterator yielding Results containing serde_json::Value JSON values.

use serde_json; // 1.0.39

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let stdin = std::io::stdin();
    let stdin = stdin.lock();

    let deserializer = serde_json::Deserializer::from_reader(stdin);
    let iterator = deserializer.into_iter::<serde_json::Value>();
    for item in iterator {
        println!("Got {:?}", item?);
    }

    Ok(())
}

One thing to be aware of: if a syntax error is encountered, the iterator will start to produce an infinite sequence of error results and never move on. You need to make sure you handle the errors inside of the loop, or the loop will never end. In the snippet above, we do this by using the ? question mark operator to break the loop and return the first serde_json::Result::Err from our function.

like image 78
Jeremy Avatar answered Sep 26 '22 00:09

Jeremy