Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Showing tables from specific database with Pyspark and Hive

Tags:

Having some databases and tables in them in Hive instance. I'd like to show tables for some specific database (let's say 3_db).

+------------------+--+
|  database_name   |
+------------------+--+
| 1_db             |
| 2_db             |
| 3_db             |
+------------------+--+

If I enter beeline from bash-nothing complex there, I just do the following:

show databases;
show tables from 3_db;

When I'm using pyspark via ipython notebeook- my cheap tricks are not working there and give me error on the second line (show tables from 3_db) instead:

sqlContext.sql('show databases').show()
sqlContext.sql('show tables from 3_db').show()

What seems to be wrong and why's the same code works in one place and don't work in another?

like image 528
Keithx Avatar asked Feb 27 '17 15:02

Keithx


2 Answers

sqlContext.sql("show tables in 3_db").show()
like image 190
David דודו Markovitz Avatar answered Oct 10 '22 15:10

David דודו Markovitz


Another possibility is to use the Catalog methods:

spark = SparkSession.builder.getOrCreate()
spark.catalog.listTables("3_db")

Just be aware that in PySpark this method returns a list and in Scala, it returns a DataFrame.

like image 35
aelesbao Avatar answered Oct 10 '22 14:10

aelesbao