Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Realtime request-based recommendations with Spark - Spark JobServer?

We are trying to find a way to load a Spark (2.x) ML trained model so that on request (through a REST interface) we can query it and get the predictions, e.g. http://predictor.com:8080/give/me/predictions?a=1,b=2,c=3

There are libs out-of-box to load a model into Spark (given it was stored somewhere after training using MLWritable) and then use it for predictions, but it seems overkill to wrap it in a job and run this per request/call due to SparkContext's initialization.

However, using Spark has the advantage that we can save our Pipeline model and perform the same feature transformations without having to implement it outside of the SparkContext.

After some digging, we found that spark-job-server can potentially help us with this problem by allowing us to have a "hot" spark-context initialized for the job-server and hence, we can then serve the requests by calling the prediction job (and getting the results) within the existent context using the spark-job-server's REST API.

Is this the best approach to API-ify a prediction? Due to the feature space we cannot pre-predict all combinations.

Alternatively we were thinking about using Spark Streaming and persisting the predictions to a message queue. This allows us to not use spark-job-server but it doesn't simplify the overall flow. Has anyone tried a similar approach?

like image 356
fritsjanb Avatar asked Oct 29 '22 12:10

fritsjanb


1 Answers

Another option could be cloudera's livy (http://livy.io/ | https://github.com/cloudera/livy#rest-api) which allows for session caching, interactive queries, batch jobs and more. I've used it and found it very promising.

like image 161
Garren S Avatar answered Nov 15 '22 09:11

Garren S