Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Common metadata in databricks cluster

I have a 3-4 clusters in my databricks instance of Azure cloud platform. I want to maintain a common metastore for all the cluster. Let me know if anyone implemented this.

like image 431
pankajs Avatar asked Dec 18 '25 11:12

pankajs


1 Answers

I recommend configuring an external Hive metastore. By default, Detabricks spins its own metastore behind the scenes. But you can create your own database (Azure SQL does work, also MySQL or Postgres) and specify it during the cluster startup.

Here are detailed steps: https://learn.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore

Things to be aware of:

  • Data tab in Databricks - you can choose the cluster and see different metastores.
  • To avoid using SQL user&password, look at Managed Identities https://learn.microsoft.com/en-us/azure/stream-analytics/sql-database-output-managed-identity
  • Automate external Hive metastore connections by using initialization scripts for your cluster
  • Permissions management on your sources. In case of ADLS Gen 2, consider using password pass-through
like image 51
Valdas M Avatar answered Dec 20 '25 00:12

Valdas M



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!