Is it possible to cache a data frame and then reference (query) it in another script?...My goal is as follows:
It is not possible using standard Spark binaries. Spark DataFrame
is bound to the specific SQLContext
which has been used to create it and is not accessible outside it.
There are tools, like for example Apache Zeppelin or Databricks, which use shared context injected into different sessions. This is way you can share temporary tables between different sessions and or guest languages.
There are other platforms, including spark-jobserver
and Apache Ignite, which provide alternative ways to share distributed data structures. You can also take a look at the Livy server.
See also: Share SparkContext between Java and R Apps under the same Master
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With