I'm writing a spark job that needs to be runnable locally as well as on Databricks.
The code has to be slightly different in each environment (file paths) so I'm trying to find a way to detect if the job is running in Databricks. The best way I have found so far was to look for a "dbfs" directory in the root dir and if it's there then assume it's running on Databricks. This doesn't feel like the right solution. Does anyone have a better idea?
Click on Advanced Options => Enter Environment Variables. After creation: Select your cluster => click on Edit => Advance Options => Edit or Enter new Environment Variables => Confirm and Restart.
Options: --cluster-id CLUSTER_ID Can be found in the URL at https://<databricks-instance>/#/setting/clusters/$CLUSTER_ID/configuration.
1. Magic command %pip: Install Python packages and manage Python Environment. Databricks Runtime (DBR) or Databricks Runtime for Machine Learning (MLR) installs a set of Python and common machine learning (ML) libraries. But the runtime may not have a specific library or version pre-installed for your task at hand.
You can simply check for the existence of an environment variable e.g.:
def isRunningInDatabricks(): Boolean =
sys.env.contains("DATABRICKS_RUNTIME_VERSION")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With