Simple question, but I can't find a simple guide on how to set the environment variable in Databricks. Also, is it important to set the environment variable on both the driver and executors (and would you do this via spark.conf)? Thanks
Before creation:
You can set environment variable while creating the cluster.
Click on Advanced Options => Enter Environment Variables.
After creation:
Select your cluster => click on Edit => Advance Options => Edit or Enter new Environment Variables => Confirm and Restart.
OR
You can achieve the desired results by appending my environment variable declarations to the file /databricks/spark/conf/spark-env.sh. You may change the init file as follows:
%scala
dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
|#!/bin/bash
|
|cat << 'EOF' > /databricks/driver/conf/00-custom-spark-driver-defaults.conf
|[driver] {
| "spark.sql.sources.partitionOverwriteMode" = "DYNAMIC"
|}
|EOF
""".stripMargin, true)
For more details, refer “Databricks – Spark Configuration”.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With