In Apache Spark. How to set worker/executor's environment variables?

Tags:

My spark program on EMR is constantly getting this error:

Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
    at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:421)
    at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:128)
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:397)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
    at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
    at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:334)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:942)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2148)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2075)
    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1093)
    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:548)
    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:172)
    at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
    at org.apache.hadoop.fs.s3native.$Proxy8.retrieveMetadata(Unknown Source)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:341)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)

I did some research and found out that this authentication can be disabled in low-security situation, by setting environment variable:

com.amazonaws.sdk.disableCertChecking=true

but I can only set it with spark-submit.sh --conf, which only affects driver, while most of the errors are on workers.

Is there a way to propagate them to workers?

Thanks a lot.

898

asked Mar 30 '15 18:03

tribbloid

1 Answers

For spark 2.4 , @Amit Kushwaha 's method doesn't work.

I have tested:

1. cluster mode

spark-submit --conf spark.executorEnv.DEBUG=1 --conf spark.appMasterEnv.DEBUG=1 --conf spark.yarn.appMasterEnv.DEBUG=1 --conf spark.yarn.executorEnv.DEBUG=1 main.py

2. client mode

spark-submit --deploy-mode=client --conf spark.executorEnv.DEBUG=1 --conf spark.appMasterEnv.DEBUG=1 --conf spark.yarn.appMasterEnv.DEBUG=1 --conf spark.yarn.executorEnv.DEBUG=1 main.py

None of above can set enviroment variable into executor system(aka. can not read by os.environ.get('DEBUG')) .

The only way is get from spark.conf:

submit:

spark-submit --conf DEBUG=1 main.py

get variable:

DEBUG = spark.conf.get('DEBUG')

103

answered Oct 18 '22 12:10

Mithril

Related questions
                            
                                How to reduce the size of packaged python zip files for AWS Lambda
                            
                                Why is the method response of an API gateway different when being created using terraform?
                            
                                Aptitude: Show What Repo a Package is From, Listing Contents of a Repo
                            
                                Upload direct to S3 or via EC2?
                            
                                Automatically processing the Amazon SES notifications for bounce and complaint notifications
                            
                                How to redirect domain with prefix www in AWS Route 53
                            
                                How can I run Rails background jobs on AWS Elastic Beanstalk?
                            
                                How to pass Cognito token to Amazon API Gateway?
                            
                                How to route between two subnets in an AWS VPC w/ Terraform?
                            
                                Alexa skill SSML max length
                            
                                How to Launch Spark 2.0 on EC2
                            
                                AWS API Gateway returns a 403 with x-amzn-ErrorType:AccessDeniedException header
                            
                                AWS Elastic Beanstalk: How to use environment variables in ebextensions?
                            
                                How to add encryption to boto3.s3.transfer.TransferConfig for s3 file upload
                            
                                Backend and Frontend on same port
                            
                                Amazon Elastic Block Store and EC2 drive
                            
                                aws cli lambda `update-function-configuration` deletes existing environment variables
                            
                                S3 Presigned URL Multiple Content Disposition Headers
                            
                                PostgreSQL on Elastic Beanstalk (Amazon Linux 2)
                            
                                how to get all instances with a tag under my amazon account using aws java sdk

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In Apache Spark. How to set worker/executor's environment variables?

Tags:

amazon-web-services

amazon-s3

distributed-computing

apache-spark