Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

See managed tables in Databricks AWS

I need to identify and list all managed tables in a Databricks AWS workspace. I can see that manually in the table details, but I need to this for several thousand tables on different databases, and I cannot find a way to automate it. The only way I found to tell programmatically if a table is managed or external is with the DESCRIBE TABLE EXTENDED command, but that returns it as a value on a column, and cannot be used with SELECT or WHERE to filter, even if I try running it as a subquery. What is the easiest way to filter the managed tables?

like image 443
WarriorOfWeeks Avatar asked Oct 19 '25 20:10

WarriorOfWeeks


1 Answers

spark.sql('use my_database')
df = spark.sql('show tables in my_database')
for t in df.collect():
      print('table {}'.format(t.tableName))
      display(spark.sql('describe table extended {}'.format(t.tableName)).where("col_name='Type' and data_type='MANAGED'"))
      #use if condition to filter out the Managed data_type and collect the database and table names 
      #loop over all databases using "show databases" in outer loop
like image 102
inder Avatar answered Oct 21 '25 12:10

inder