Spark step on EMR just hangs as "Running" after done writing to S3

Tags:

Running PySpark 2 job on EMR 5.1.0 as a step. Even after the script is done with a _SUCCESS file written to S3 and Spark UI showing the job as completed, EMR still shows the step as "Running". I've waited for over an hour to see if Spark was just trying to clean itself up but the step never shows as "Completed". The last thing written in the logs is:

INFO MultipartUploadOutputStream: close closed:false s3://mybucket/some/path/_SUCCESS
INFO DefaultWriterContainer: Job job_201611181653_0000 committed.
INFO ContextCleaner: Cleaned accumulator 0

I didn't have this problem with Spark 1.6. I've tried a bunch of different hadoop-aws and aws-java-sdk jars to no avail.

I'm using the default Spark 2.0 configurations so I don't think anything else like metadata is being written. Also the size of the data doesn't seem to have an impact on this problem.

279

asked Nov 18 '16 20:11

Kamil Sindi

1 Answers

If you aren't already, you should close your spark context.

sc.stop()

Also, if you are watching the Spark Web UI via a browser, you should close that as it sometimes keeps the spark context alive. I recall seeing this on the spark dev mailing list, but can't find the jira for it.

173

answered Nov 15 '22 03:11

J Maurer

Related questions
                            
                                AWS User Pool Setup (Swift)
                            
                                User Pool and Federated Identity
                            
                                Using API Gateway to publish SNS topics / multiple lambda function with API Gateway
                            
                                Connect to AWS MySQL database via Node JS
                            
                                What's the purpose of the Metadata section in a CloudFormation Template?
                            
                                Redshift DEFAULT GETDATE() working on INSERT but not COPY
                            
                                AWS Elastic Beanstalk .ebextensions order of precedence
                            
                                Can AWS CloudFormation call the AWS API?
                            
                                Unable to compile AWS CustomIdentityProvider on xcode 8 beta 6
                            
                                How to use a second Elastic Network Interface on the same subnet
                            
                                How to create application load balancer on aws for kubernetes
                            
                                Can we verify (or have ) same domain in Amazon SES from different AWS account?
                            
                                How to get client ip from node.js express application deployed on AWS Elastic beanstalk?
                            
                                Is there a way to create AWS Dashboard using cloudformation or AWS CLI
                            
                                AWS Lambda function was trigger twice by CloudWatch event
                            
                                Aws passing credentials to ansible s3 module
                            
                                Failed to run a Shiny app on AWS Ubuntu instance. xdg-open: no method available for opening 'http://127.0.0.1:3572'
                            
                                DynamoDB: can we use encryption and cross-region replication together?
                            
                                Getting "The authorization grant type is not supported by the authorization server" from amazon
                            
                                Use Prometheus operator with DB volume for k8s

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark step on EMR just hangs as "Running" after done writing to S3

Tags:

amazon-web-services

amazon-s3

apache-spark

pyspark

apache-spark-2.0

Kamil Sindi

People also ask

1 Answers

J Maurer

Recent Activity

Donate For Us