Streamed JSON decoding using preferably Circe and Akka Streams

Question

My use case is similar to this entry, in wanting to read an inner, huge array (multiple gigabytes as text) from within a JSON object such as:

{ "a": "...",   // root level fields to be read, separately
  ...
  "bs": [       // the huge array, most of the payload (can be multiple GB's)
    {...},
    ...
  ]
}

The input is available as a Source[ByteString,_] (Akka stream), and I'm using Circe for JSON decoding, elsewhere.

I can see two challenges:

Reading the bs array in a streamed fashion (getting a Source[B,_] for consuming it).
Splitting the original stream to two, so I can read and analyse the root level fields before the array begins.

Do you have pointers to solving such a use case? I have checked akka-stream-json and circe-iteratee, so far.

akka-stream-json looks like the thing, but is not very maintained. circe-iteratee does not seem to have integration with Akka Streams.

Andriy Plokhotnyuk · Accepted Answer

Jawn has an async parser: https://github.com/non/jawn/blob/master/parser/src/main/scala/jawn/AsyncParser.scala

But it is hard to write an efficient async parser for JSON because of its sequential origin.

If you can switch to the synchronous parsing then you can use jsoniter-scala-core and write a simple custom codec which will skip all not needed key/value pairs except "bs" and then parse required data blazingly fast without holding or array content in memory.

Streamed JSON decoding using preferably Circe and Akka Streams

Tags:

json

akka-stream

circe

akauppi

1 Answers

Andriy Plokhotnyuk

Recent Activity

Donate For Us

Streamed JSON decoding using preferably Circe and Akka Streams

Tags:

json

akka-stream

circe

akauppi

1 Answers

Andriy Plokhotnyuk

Related questions

Recent Activity

Donate For Us