How do design an architecture for real time transactional data for classifying as fraud or not?
Random forest classifier ML model is developed, trained and tested using historical data using Scala and Spark MLLib and persisted.
Real time transaction data is getting using Apache Kafka from one topic and Spark Streaming processed and writing to another topic for prediction by classifier ML model.
My concern: How do I provide and get predicted current transaction data recieved from Kafka topic using ML Mode mentioned above?
What is best practice for getting predicted online current single transaction data using already trained and tested ML model?
Any design suggestions are welcome.
You can save the model after training and use it in real time api for prediction. For e.g., https://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/predict.html Another solution could be to use sparkling-water and use POJO: https://github.com/h2oai/sparkling-water/tree/master/examples#step-by-step-through-weather-data-example
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With