Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Apache Spark. How to set worker/executor's environment variables?

My spark program on EMR is constantly getting this error:

Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
    at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:421)
    at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:128)
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:397)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
    at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
    at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:334)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:942)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2148)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2075)
    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1093)
    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:548)
    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:172)
    at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
    at org.apache.hadoop.fs.s3native.$Proxy8.retrieveMetadata(Unknown Source)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:341)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)

I did some research and found out that this authentication can be disabled in low-security situation, by setting environment variable:

com.amazonaws.sdk.disableCertChecking=true

but I can only set it with spark-submit.sh --conf, which only affects driver, while most of the errors are on workers.

Is there a way to propagate them to workers?

Thanks a lot.

like image 898
tribbloid Avatar asked Mar 30 '15 18:03

tribbloid


People also ask

How do I set environment variables in Spark?

Spark Configuration Spark properties control most application parameters and can be set by using a SparkConf object, or through Java system properties. Environment variables can be used to set per-machine settings, such as the IP address, through the conf/spark-env.sh script on each node.

How do I set the executor number in Spark?

If cluster/application is not enabled dynamic allocation and if you set --conf spark. executor. instances=1 then it will launch only 1 executor. Apart from executor, you will see AM/driver in the Executor tab Spark UI.

How do you choose the driver and executor memory in Spark?

Determine the memory resources available for the Spark application. Multiply the cluster RAM size by the YARN utilization percentage. Provides 5 GB RAM for available drivers and 50 GB RAM available for worker nodes. Discount 1 core per worker node to determine the executor core instances.

How would you set the number of executors in any Spark based application?

The number of executors for a spark application can be specified inside the SparkConf or via the flag –num-executors from command-line. Cluster Manager : An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN).


1 Answers

For spark 2.4 , @Amit Kushwaha 's method doesn't work.

I have tested:

1. cluster mode

spark-submit --conf spark.executorEnv.DEBUG=1 --conf spark.appMasterEnv.DEBUG=1 --conf spark.yarn.appMasterEnv.DEBUG=1 --conf spark.yarn.executorEnv.DEBUG=1 main.py

2. client mode

spark-submit --deploy-mode=client --conf spark.executorEnv.DEBUG=1 --conf spark.appMasterEnv.DEBUG=1 --conf spark.yarn.appMasterEnv.DEBUG=1 --conf spark.yarn.executorEnv.DEBUG=1 main.py

None of above can set enviroment variable into executor system(aka. can not read by os.environ.get('DEBUG')) .


The only way is get from spark.conf:

submit:

spark-submit --conf DEBUG=1 main.py

get variable:

DEBUG = spark.conf.get('DEBUG')
like image 103
Mithril Avatar answered Oct 18 '22 12:10

Mithril