I need to identify and list all managed tables in a Databricks AWS workspace. I can see that manually in the table details, but I need to this for several thousand tables on different databases, and I cannot find a way to automate it. The only way I found to tell programmatically if a table is managed or external is with the DESCRIBE TABLE EXTENDED command, but that returns it as a value on a column, and cannot be used with SELECT or WHERE to filter, even if I try running it as a subquery. What is the easiest way to filter the managed tables?
spark.sql('use my_database')
df = spark.sql('show tables in my_database')
for t in df.collect():
print('table {}'.format(t.tableName))
display(spark.sql('describe table extended {}'.format(t.tableName)).where("col_name='Type' and data_type='MANAGED'"))
#use if condition to filter out the Managed data_type and collect the database and table names
#loop over all databases using "show databases" in outer loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With