Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Databricks python command to show current cluster config

I am currently optimizing our ETL process, and would like to be able to see the existing cluster configuration used when processing data. This way, I can track over time which worker node sizes I should use.

Is there a command to return the cluster worker # and sizes in python so I can write as a dataframe?

like image 565
Pablo Boswell Avatar asked Oct 13 '25 10:10

Pablo Boswell


1 Answers

You can get this information by calling Cluster Get REST API - it will return JSON including the number of workers, node types, etc. Something like this:

import requests
ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
host_name = ctx.tags().get("browserHostName").get()
host_token = "your_PAT_token"
cluster_id = ctx.tags().get("clusterId").get()

response = requests.get(
    f'https://{host_name}/api/2.0/clusters/get?cluster_id={cluster_id}',
    headers={'Authorization': f'Bearer {host_token}'}
  ).json()
num_workers = response['num_workers']

P.S. if you have non-notebook job, then PAT token may not be available, but you can generate your token, and put it there instead

like image 79
Alex Ott Avatar answered Oct 14 '25 22:10

Alex Ott