Real time prediction of online data using Spark Streaming and Machine Learning

Question

How do design an architecture for real time transactional data for classifying as fraud or not?

Random forest classifier ML model is developed, trained and tested using historical data using Scala and Spark MLLib and persisted.

Real time transaction data is getting using Apache Kafka from one topic and Spark Streaming processed and writing to another topic for prediction by classifier ML model.

My concern: How do I provide and get predicted current transaction data recieved from Kafka topic using ML Mode mentioned above?

What is best practice for getting predicted online current single transaction data using already trained and tested ML model?

Any design suggestions are welcome.

gosanjeev · Accepted Answer

You can save the model after training and use it in real time api for prediction. For e.g., https://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/predict.html Another solution could be to use sparkling-water and use POJO: https://github.com/h2oai/sparkling-water/tree/master/examples#step-by-step-through-weather-data-example

Real time prediction of online data using Spark Streaming and Machine Learning

Tags:

apache-spark

random-forest

apache-spark-mllib

spark-streaming

Gopinathan K M

1 Answers

gosanjeev

Recent Activity

Donate For Us

Real time prediction of online data using Spark Streaming and Machine Learning

Tags:

apache-spark

random-forest

apache-spark-mllib

spark-streaming

Gopinathan K M

1 Answers

gosanjeev

Related questions

Recent Activity

Donate For Us