Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why spark's global_temp database not visible?

Tags:

apache-spark

With the new createGlobalTempView in Spark 2.1.0, it is possible to share a table amongst multiple spark sessions

However, this database can't be accessible from the outside. For example:

scala> spark.sql("select * from global_temp.salaries")
res240: org.apache.spark.sql.DataFrame = [yearID: string, teamID: string ... 3 more fields]

scala> salaries.createGlobalTempView("salaries")

scala> spark.sql("select * from global_temp.salaries").show(5)
+------+------+----+---------+------+
|yearID|teamID|lgID| playerID|salary|
+------+------+----+---------+------+
|  1985|   ATL|  NL|barkele01|870000|
|  1985|   ATL|  NL|bedrost01|550000|
|  1985|   ATL|  NL|benedbr01|545000|
|  1985|   ATL|  NL| campri01|633333|
|  1985|   ATL|  NL|ceronri01|625000|
+------+------+----+---------+------+
only showing top 5 rows

Nothing is wrong here, but here comes the strange behaviour

scala> spark.catalog.listTables.show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
+----+--------+-----------+---------+-----------+

scala> spark.catalog.tableExists("global_temp","salaries")
res249: Boolean = true

My guess is that global_temp database is hidden for all users, but it is possible to query tables on it if we already know which table to query.

Is it a normal behaviour or am I doing something wrong?

Thanks for any explanations

like image 932
Will Avatar asked Dec 28 '25 20:12

Will


1 Answers

When you run spark.catalog.listTables.show , if you don't specify the database for the listTables() function it will point to default database.

Try this instead:

spark.catalog.listTables("global_temp").show

It's definitely not hidden for all users, quite the opposite. It will only be visible while your spark session is running, but will be visible to other spark sessions running simultaneously, for example a colleague running their own spark-shell on the same cluster & catalog setup.

like image 129
Davos Avatar answered Dec 30 '25 19:12

Davos