How to share Spark RDD between 2 Spark contexts?

Question

I have an RMI cluster. Each RMI server has a Spark context. Is there any way to share an RDD between different Spark contexts?

zero323 · Accepted Answer

As already stated by Daniel Darabos it is not possible. Every distributed object in Spark is bounded to specific context which has been used to create it (SparkContext in case of RDD, SQLContext in case of DataFrame dataset). If you want share objects between applications you have to use shared contexts (see for example spark-jobserver, Livy, or Apache Zeppelin). Since RDD or DataFrame is just a small local object there is really not much to share.

Sharing data is a completely different problem. You can use specialized in memory cache (Apache Ignite) or distributed in memory file systems (like Alluxio - former Tachyon) to minimize the latency when switching between application but you cannot really avoid it.

Daniel Darabos · Answer

No, an RDD is tied to a single SparkContext. The general idea is that you have a Spark cluster and one driver program that tells the cluster what to do. This driver would have the SparkContext and kick off operations on the RDDs.

If you want to just move an RDD from one driver program to another, the solution is to write it to disk (S3/HDFS/...) in the first driver and load it from disk in the other driver.

How to share Spark RDD between 2 Spark contexts?

Tags:

apache-spark

rdd

simafengyun

2 Answers

zero323

Daniel Darabos

Recent Activity

Donate For Us

How to share Spark RDD between 2 Spark contexts?

Tags:

apache-spark

rdd

simafengyun

2 Answers

zero323

Daniel Darabos

Related questions

Recent Activity

Donate For Us