Spark Streaming historical state

Tags:

I am building real time processing for detecting fraud ATM card transaction. in order to efficiently detect fraud, logic requires to have last transaction date by card, sum of transaction amount by day (or last 24 Hrs.)

One of usecase is if card transaction outside native country for more than a 30 days of last transaction in that country then send alert as possible fraud

So tried to look at Spark streaming as a solution. In order to achieve this (probably I am missing idea about functional programming) below is my psudo code

stream=ssc.receiverStream() //input receiver 
s1=stream.mapToPair() // creates key with card and transaction date as value
s2=stream.reduceByKey() // applies reduce operation for last transaction date 
s2.checkpoint(new Duration(1000));
s2.persist();

I am facing two problem here

1) how to use this last transaction date further for future comparison from same card
2) how to persist data so even if restart drive program then old values of s2 restores back 3) updateStateByKey can used to maintain historical state?

I think I am missing key point of spark streaming/functional programming that how to implement this kind of logic.

998

asked Jun 20 '14 16:06

Jigar Parekh

1 Answers

If you are using Spark Streaming you shouldn't really save your state on a file, especially if you are planning to run your application 24/7. If that is not your intention, you will be probably be fine with just a Spark application since you are facing only big data computation and not computation over batches coming real time.

Yes, updateStateByKey can be used to maintain state through the various batches but it has a particular signature that you can see in the docs: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions

Also persist() it's just a form of caching, it doesn't actually persist your data on disk (like on a file).

Hope to have clarified some of your doubts.

102

answered Oct 19 '22 23:10

gprivitera

Related questions
                            
                                How to use SIGAR with maven on Linux?
                            
                                Desktop browse does not work in java for Ubuntu [duplicate]
                            
                                Repeat steps of a Spring Batch flow for each item of a list
                            
                                Attempt to change component state security exception in android
                            
                                AspectJ load time weaving not working on Spring beans
                            
                                when does the main thread die?
                            
                                Spring Data Rest ManytoMany POST
                            
                                JavaFx WebView - Scroll to desired position
                            
                                Changing the thumb color and background color of a JScrollPane?
                            
                                How to write Java e2e (end to end) tests [closed]
                            
                                Default log4j MDC value
                            
                                how i can see the output console result in netbeans
                            
                                How to convert PNG with alpha to JPEG conserving colors using JAVA
                            
                                Android app uninstallation event for analytics
                            
                                Static List in Classes for Objects
                            
                                Generic Singleton Factory
                            
                                Differences in safe publishing between volatile,final and synchronized
                            
                                Using StringUtils.isEmpty without losing compiler warnings about null pointers
                            
                                How can I force AmazonSQSBufferedAsyncClient to flush messages?
                            
                                Logback - get method name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark Streaming historical state

Tags:

java

scala

apache-spark

spark-streaming

shark-sql

Jigar Parekh

People also ask

1 Answers

gprivitera

Recent Activity

Donate For Us