I have an RMI cluster. Each RMI server has a Spark context. Is there any way to share an RDD between different Spark contexts?
As already stated by Daniel Darabos it is not possible. Every distributed object in Spark is bounded to specific context which has been used to create it (SparkContext
in case of RDD, SQLContext
in case of DataFrame
dataset). If you want share objects between applications you have to use shared contexts (see for example spark-jobserver
, Livy, or Apache Zeppelin). Since RDD
or DataFrame
is just a small local object there is really not much to share.
Sharing data is a completely different problem. You can use specialized in memory cache (Apache Ignite) or distributed in memory file systems (like Alluxio - former Tachyon) to minimize the latency when switching between application but you cannot really avoid it.
No, an RDD is tied to a single SparkContext
. The general idea is that you have a Spark cluster and one driver program that tells the cluster what to do. This driver would have the SparkContext
and kick off operations on the RDDs.
If you want to just move an RDD from one driver program to another, the solution is to write it to disk (S3/HDFS/...) in the first driver and load it from disk in the other driver.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With