With the new createGlobalTempView in Spark 2.1.0, it is possible to share a table amongst multiple spark sessions
However, this database can't be accessible from the outside. For example:
scala> spark.sql("select * from global_temp.salaries")
res240: org.apache.spark.sql.DataFrame = [yearID: string, teamID: string ... 3 more fields]
scala> salaries.createGlobalTempView("salaries")
scala> spark.sql("select * from global_temp.salaries").show(5)
+------+------+----+---------+------+
|yearID|teamID|lgID| playerID|salary|
+------+------+----+---------+------+
| 1985| ATL| NL|barkele01|870000|
| 1985| ATL| NL|bedrost01|550000|
| 1985| ATL| NL|benedbr01|545000|
| 1985| ATL| NL| campri01|633333|
| 1985| ATL| NL|ceronri01|625000|
+------+------+----+---------+------+
only showing top 5 rows
Nothing is wrong here, but here comes the strange behaviour
scala> spark.catalog.listTables.show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
+----+--------+-----------+---------+-----------+
scala> spark.catalog.tableExists("global_temp","salaries")
res249: Boolean = true
My guess is that global_temp database is hidden for all users, but it is possible to query tables on it if we already know which table to query.
Is it a normal behaviour or am I doing something wrong?
Thanks for any explanations
When you run spark.catalog.listTables.show , if you don't specify the database for the listTables() function it will point to default database.
Try this instead:
spark.catalog.listTables("global_temp").show
It's definitely not hidden for all users, quite the opposite. It will only be visible while your spark session is running, but will be visible to other spark sessions running simultaneously, for example a colleague running their own spark-shell on the same cluster & catalog setup.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With