Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decode chunked JSON with AKKA Stream

I have a Source[ByteString, _] from an input file with 3 rows like this (in reality the input is a TCP socket with a continuos stream):

{"a":[2
33]
}

Now the problem is that I want to parse this into a Source[ChangeMessage,_], however the only examples I have found deals with when there is a whole JSON message for every row not when each JSON message can be fragmented over multiple rows.

One example I found is this this library, however it expects } or , as last character, that is one JSON per row. The example below shows this setup.

"My decoder" should "decode chunked json" in {
    implicit val sys = ActorSystem("test")
    implicit val mat = ActorMaterializer()
    val file = Paths.get("chunked_json_stream.json")
    val data = FileIO.fromPath(file)
    .via(CirceStreamSupport.decode[ChangeMessage])
    .runWith(TestSink.probe[ChangeMessage])
    .request(1)
    .expectComplete()
  }

Another alternative would be to use a fold and balance } and only emit when a whole JSON is completed. The problem with this is that the fold operator only emits when the stream completes and since this is a continuous stream I can not use it here.

My question is: What is the fastest way to parse chunked JSON streams in AKKA Stream and are there any available software that already does this? If possible I would like to use circe

like image 913
user3139545 Avatar asked May 26 '17 08:05

user3139545


1 Answers

As documentation of knutwalker/akka-stream-json says:

This flow even supports parsing multiple json documents in whatever fragmentation they may arrive, which is great for consuming stream/sse based APIs.

In your case all you need to do is to just delimit the incoming ByteStrings:

"My decoder" should "decode chunked json" in {
    implicit val sys = ActorSystem("test")
    implicit val mat = ActorMaterializer()
    val file = Paths.get("chunked_json_stream.json")

    val sourceUnderTest =
      FileIO.fromPath(file)
        .via(Framing.delimiter(ByteString("\n"), 8192, allowTruncation = true))
        .via(CirceStreamSupport.decode[ChangeMessage])

    sourceUnderTest
      .runWith(TestSink.probe[ChangeMessage])
      .request(1)
      .expectNext(ChangeMessage(List(233)))
      .expectComplete()
}

That's because when reading from file, ByteString elements contain multiple lines and therefore Circe is not able to parse malformed jsons. When you delimit by new line, each element in the stream is a separate line and therefore Circe is able to parse it using the aformentioned feature.

like image 126
bszwej Avatar answered Oct 16 '22 07:10

bszwej