Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idiomatic way to use Spark DStream as Source for an Akka stream

I'm building a REST API that starts some calculation in a Spark cluster and responds with a chunked stream of the results. Given the Spark stream with calculation results, I can use

dstream.foreachRDD()

to send the data out of Spark. I'm sending the chunked HTTP response with akka-http:

val requestHandler: HttpRequest => HttpResponse = {
  case HttpRequest(HttpMethods.GET, Uri.Path("/data"), _, _, _) =>
    HttpResponse(entity = HttpEntity.Chunked(ContentTypes.`text/plain`, source))
}

For simplicity, I'm trying to get plain text working first, will add JSON marshalling later.

But what is the idiomatic way of using the Spark DStream as a Source for the Akka stream? I figured I should be able to do it via a socket but since the Spark driver and the REST endpoint are sitting on the same JVM opening a socket just for this seems a bit of an overkill.

like image 208
user3474104 Avatar asked Oct 28 '15 04:10

user3474104


1 Answers

Not sure about the version of api at the time of the question. But now, with akka-stream 2.0.3, I believe you can do it like:

val source = Source
  .actorRef[T](/* buffer size */ 100, OverflowStrategy.dropHead)
  .mapMaterializedValue[Unit] { actorRef =>
    dstream.foreach(actorRef ! _)
  }
like image 192
PH88 Avatar answered Sep 28 '22 03:09

PH88