Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Real time prediction of online data using Spark Streaming and Machine Learning

How do design an architecture for real time transactional data for classifying as fraud or not?

Random forest classifier ML model is developed, trained and tested using historical data using Scala and Spark MLLib and persisted.

Real time transaction data is getting using Apache Kafka from one topic and Spark Streaming processed and writing to another topic for prediction by classifier ML model.

My concern: How do I provide and get predicted current transaction data recieved from Kafka topic using ML Mode mentioned above?

What is best practice for getting predicted online current single transaction data using already trained and tested ML model?

Any design suggestions are welcome.

like image 982
Gopinathan K M Avatar asked Nov 07 '22 18:11

Gopinathan K M


1 Answers

You can save the model after training and use it in real time api for prediction. For e.g., https://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/predict.html Another solution could be to use sparkling-water and use POJO: https://github.com/h2oai/sparkling-water/tree/master/examples#step-by-step-through-weather-data-example

like image 184
gosanjeev Avatar answered Nov 15 '22 11:11

gosanjeev