Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect Databricks environment programmatically

I'm writing a spark job that needs to be runnable locally as well as on Databricks.

The code has to be slightly different in each environment (file paths) so I'm trying to find a way to detect if the job is running in Databricks. The best way I have found so far was to look for a "dbfs" directory in the root dir and if it's there then assume it's running on Databricks. This doesn't feel like the right solution. Does anyone have a better idea?

like image 383
steven35 Avatar asked Jul 13 '18 16:07

steven35


People also ask

How do I read environment variables in Databricks?

Click on Advanced Options => Enter Environment Variables. After creation: Select your cluster => click on Edit => Advance Options => Edit or Enter new Environment Variables => Confirm and Restart.

How do I find the cluster ID for Databricks?

Options: --cluster-id CLUSTER_ID Can be found in the URL at https://<databricks-instance>/#/setting/clusters/$CLUSTER_ID/configuration.

What is magic command in Databricks?

1. Magic command %pip: Install Python packages and manage Python Environment. Databricks Runtime (DBR) or Databricks Runtime for Machine Learning (MLR) installs a set of Python and common machine learning (ML) libraries. But the runtime may not have a specific library or version pre-installed for your task at hand.


1 Answers

You can simply check for the existence of an environment variable e.g.:

def isRunningInDatabricks(): Boolean = 
  sys.env.contains("DATABRICKS_RUNTIME_VERSION")
like image 82
pathikrit Avatar answered Sep 19 '22 14:09

pathikrit