How i can integrate Apache Spark with the Play Framework to display predictions in real time?

Question

I'm doing some testing with Apache Spark, for my final project in college. I have a data set that I use to generate a decision tree, and make some predictions on new data.

In the future, I think to use this project into production, where I would generate a decision tree (batch processing), and through a web interface or a mobile application receives new data, making the prediction of the class of that entry, and inform the result instantly to the user. And also go storing these new entries for after a while generating a new decision tree (batch processing), and repeat this process continuously.

Despite the Apache Spark have the purpose of performing batch processing, there is the streaming API that allows you to receive real-time data, and in my application this data will only be used by a model built in a batch process with a decision tree, and how the prediction is quite fast, it allows the user to have the answer quickly.

My question is what are the best ways to integrate Apache Spark with a web application (plan to use the Play Framework scala version)?

David Griffin · Accepted Answer

One of the issues you will run into with Spark is it takes some time to start up and build a SparkContext. If you want to do Spark queries via web calls, it will not be practical to fire up spark-submit every time. Instead, you will want to turn your driver application (these terms will make more sense later) into an RPC server.

In my application I am embedding a web server (http4s) so I can do XmlHttpRequests in JavaScript to directly query my application, which will return JSON objects.

cmd · Answer

Spark is a fast large scale data processing platform. The key here is large scale data. In most cases, the time to process that data will not be sufficiently fast to meet the expectations of your average web app user. It is far better practice to perform the processing offline and write the results of your Spark processing to e.g a database. Your web app can then efficiently retrieve those results by querying that database.

That being said, spark job server server provides a REST api for submitting Spark jobs.

WeaponsGrade · Answer

Spark (< v1.6) uses Akka underneath. So does Play. You should be able to write a Spark action as an actor that communicates with a receiving actor in the Play system (that you also write).

You can let Akka worry about de/serialization, which will work as long as both systems have the same class definitions on their classpaths.

If you want to go further than that, you can write Akka Streams code that tees the data stream to your Play application.

How i can integrate Apache Spark with the Play Framework to display predictions in real time?

Tags:

scala

apache-spark

playframework-2.0

apache-spark-mllib

spark-streaming

Douglas Arantes

3 Answers

David Griffin

cmd

WeaponsGrade

Recent Activity

Donate For Us

How i can integrate Apache Spark with the Play Framework to display predictions in real time?

Tags:

scala

apache-spark

playframework-2.0

apache-spark-mllib

spark-streaming

Douglas Arantes

3 Answers

David Griffin

cmd

WeaponsGrade

Related questions

Recent Activity

Donate For Us