How to access cached data in Spark Streaming application?

Question

I have a Kafka broker with JSON data from my IoT applications. I connect to this server from a Spark Streaming application in order to do some processing.

I'd like to save in memory (RAM) some specific fields of my json data which I believe I could achieve using cache() and persist() operators.

Next time when I receive a new JSON data in the Spark Streaming application, I check in memory (RAM) if there are fields in common that I can retrieve. And if yes, I do some simple computations and I finally update the values of fields I saved in memory (RAM).

Thus, I would like to know if what I previously descibed is possible. If yes, do I have to use cache() or persist() ? And How can I retrieve from memory my fields?

Jacek Laskowski · Accepted Answer

It's possible with cache / persist which uses memory or disk for the data in Spark applications (not necessarily for Spark Streaming applications only -- it's a more general use of caching in Spark).

But...in Spark Streaming you've got special support for such use cases which are called stateful computations. See Spark Streaming Programming Guide to explore what's possible.

I think for your use case mapWithState operator is exactly what you're after.

How to access cached data in Spark Streaming application?

Tags:

Yassir S

1 Answers

Jacek Laskowski

Recent Activity

Donate For Us

How to access cached data in Spark Streaming application?

Tags:

Yassir S

1 Answers

Jacek Laskowski

Related questions

Recent Activity

Donate For Us