How to get the current batch timestamp (DStream) in Spark streaming?
I have a spark streaming application where the input data will under go many transformations.
I need the current timestamp during the execution to validate the timestamp in input data.
If I compare with the current time then timestamp might differ from each RDD transformation execution.
Is there any way to get the timestamp, when the particular Spark streaming micro batch has started or which micro batch interval it belongs?
dstream.foreachRDD((rdd, time)=> {
// time is scheduler time for the batch job.it's interval was your window/slide length.
})
dstream.transform(
(rdd, time) => {
rdd.map(
(time, _)
)
}
).filter(...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With