Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set a custom environment variable in EMR to be available for a spark Application

I need to set a custom environment variable in EMR to be available when running a spark application.

I have tried adding this:

                   ...
                   --configurations '[                                    
                                      {
                                      "Classification": "spark-env",
                                      "Configurations": [
                                        {
                                        "Classification": "export",
                                        "Configurations": [],
                                        "Properties": { "SOME-ENV-VAR": "qa1" }
                                        }
                                      ],
                                      "Properties": {}
                                      }
                                      ]'
                   ...

and also tried to replace "spark-env with hadoop-env but nothing seems to work.

There is this answer from the aws forums. but I can't figure out how to apply it. I'm running on EMR 5.3.1 and launch it with a preconfigured step from the cli: aws emr create-cluster...

like image 287
NetanelRabinowitz Avatar asked Feb 22 '17 15:02

NetanelRabinowitz


People also ask

How does EMR pass environment variables?

Use classification yarn-env to pass environment variables to the worker nodes. Use classification spark-env to pass environment variables to the driver, with deploy mode client. When using deploy mode cluster, use yarn-env.

Can we run PySpark on EMR?

You can use AWS Step Functions to run PySpark applications as EMR Steps on an existing EMR cluster. Using Step Functions, we can also create the cluster, run multiple EMR Steps sequentially or in parallel, and finally, auto-terminate the cluster.


2 Answers

Add the custom configurations like below JSON to a file say, custom_config.json

[   
  {
   "Classification": "spark-env",
   "Properties": {},
   "Configurations": [
       {
         "Classification": "export",
         "Properties": {
             "VARIABLE_NAME": VARIABLE_VALUE,
         }
       }
   ]
 }
]

And, On creating the emr cluster, pass the file reference to the --configurations option

aws emr create-cluster --configurations file://custom_config.json --other-options...
like image 144
franklinsijo Avatar answered Oct 13 '22 23:10

franklinsijo


For me replacing spark-env to yarn-env fixed issue.

like image 44
Przemek Avatar answered Oct 13 '22 22:10

Przemek