My use case is similar to this entry, in wanting to read an inner, huge array (multiple gigabytes as text) from within a JSON object such as:
{ "a": "...", // root level fields to be read, separately
...
"bs": [ // the huge array, most of the payload (can be multiple GB's)
{...},
...
]
}
The input is available as a Source[ByteString,_] (Akka stream), and I'm using Circe for JSON decoding, elsewhere.
I can see two challenges:
Reading the bs array in a streamed fashion (getting a Source[B,_] for consuming it).
Splitting the original stream to two, so I can read and analyse the root level fields before the array begins.
Do you have pointers to solving such a use case? I have checked akka-stream-json and circe-iteratee, so far.
akka-stream-json looks like the thing, but is not very maintained.
circe-iteratee does not seem to have integration with Akka Streams.
Jawn has an async parser: https://github.com/non/jawn/blob/master/parser/src/main/scala/jawn/AsyncParser.scala
But it is hard to write an efficient async parser for JSON because of its sequential origin.
If you can switch to the synchronous parsing then you can use jsoniter-scala-core and write a simple custom codec which will skip all not needed key/value pairs except "bs" and then parse required data blazingly fast without holding or array content in memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With