Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange spark ERROR on AWS EMR

Tags:

I have a really simple PySpark script that creates a dataframe from some parquet data on S3 and then call count() method and print out the number of records.

I run the script on AWS EMR cluster and I'm seeing following strange WARN information:

17/12/04 14:20:26 WARN ServletHandler:  javax.servlet.ServletException: java.util.NoSuchElementException: None.get     at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)     at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)     at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)     at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)     at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)     at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)     at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)     at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)     at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)     at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)     at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)     at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)     at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)     at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)     at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:461)     at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)     at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)     at org.spark_project.jetty.server.Server.handle(Server.java:524)     at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)     at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)     at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)     at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)     at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)     at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)     at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)     at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)     at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)     at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)     at java.lang.Thread.run(Thread.java:748) Caused by: java.util.NoSuchElementException: None.get     at scala.None$.get(Option.scala:347)     at scala.None$.get(Option.scala:345)     at org.apache.spark.status.api.v1.MetricHelper.submetricQuantiles(AllStagesResource.scala:313)     at org.apache.spark.status.api.v1.AllStagesResource$$anon$1.build(AllStagesResource.scala:178)     at org.apache.spark.status.api.v1.AllStagesResource$.taskMetricDistributions(AllStagesResource.scala:181)     at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:71)     at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:62)     at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:130)     at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:126)     at org.apache.spark.status.api.v1.OneStageResource.withStage(OneStageResource.scala:97)     at org.apache.spark.status.api.v1.OneStageResource.withStageAttempt(OneStageResource.scala:126)     at org.apache.spark.status.api.v1.OneStageResource.taskSummary(OneStageResource.scala:62)     at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     at java.lang.reflect.Method.invoke(Method.java:498)     at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)     at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)     at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)     at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)     at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)     at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)     at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)     at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)     at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)     at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)     at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)     at org.glassfish.jersey.internal.Errors.process(Errors.java:315)     at org.glassfish.jersey.internal.Errors.process(Errors.java:297)     at org.glassfish.jersey.internal.Errors.process(Errors.java:267)     at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)     at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)     at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)     at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)     ... 28 more 17/12/04 14:20:26 WARN HttpChannel: //ip-172-31-81-10.ec2.internal:4040/api/v1/applications/application_1512395256824_0002/stages/3/0/taskSummary?proxyapproved=true javax.servlet.ServletException: java.util.NoSuchElementException: None.get     at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)     at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)     at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)     at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)     at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)     at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)     at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)     at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)     at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)     at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)     at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)     at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)     at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)     at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)     at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:461)     at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)     at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)     at org.spark_project.jetty.server.Server.handle(Server.java:524)     at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)     at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)     at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)     at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)     at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)     at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)     at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)     at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)     at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)     at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)     at java.lang.Thread.run(Thread.java:748) Caused by: java.util.NoSuchElementException: None.get     at scala.None$.get(Option.scala:347)     at scala.None$.get(Option.scala:345)     at org.apache.spark.status.api.v1.MetricHelper.submetricQuantiles(AllStagesResource.scala:313)     at org.apache.spark.status.api.v1.AllStagesResource$$anon$1.build(AllStagesResource.scala:178)     at org.apache.spark.status.api.v1.AllStagesResource$.taskMetricDistributions(AllStagesResource.scala:181)     at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:71)     at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:62)     at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:130)     at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:126)     at org.apache.spark.status.api.v1.OneStageResource.withStage(OneStageResource.scala:97)     at org.apache.spark.status.api.v1.OneStageResource.withStageAttempt(OneStageResource.scala:126)     at org.apache.spark.status.api.v1.OneStageResource.taskSummary(OneStageResource.scala:62)     at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     at java.lang.reflect.Method.invoke(Method.java:498)     at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)     at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)     at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)     at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)     at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)     at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)     at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)     at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)     at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)     at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)     at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)     at org.glassfish.jersey.internal.Errors.process(Errors.java:315)     at org.glassfish.jersey.internal.Errors.process(Errors.java:297)     at org.glassfish.jersey.internal.Errors.process(Errors.java:267)     at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)     at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)     at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)     at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) 

It seems like it does not fail the job, though. I got the count returned successfully.

Just wonder if anyone know why this happens and how to get rid of it.

Thanks

like image 919
seiya Avatar asked Dec 04 '17 14:12

seiya


People also ask

How do I debug a failed Spark job?

Now you should be ready to debug. Simply start spark with the above command, then select the IntelliJ run configuration you just created and click Debug. IntelliJ should connect to your Spark application, which should now start running. You can set break points, inspect variables, etc.

What is Spark stage failure?

In Spark, stage failures happen when there's a problem with processing a Spark task. These failures can be caused by hardware issues, incorrect Spark configurations, or code problems. When a stage failure occurs, the Spark driver logs report an exception similar to the following: org. apache.

How do I resolve out of memory error in Spark?

You can resolve it by setting the partition size: increase the value of spark. sql. shuffle. partitions.

What happens if a Spark job fails?

Failure of worker node – The node which runs the application code on the Spark cluster is Spark worker node. These are the slave nodes. Any of the worker nodes running executor can fail, thus resulting in loss of in-memory If any receivers were running on failed nodes, then their buffer data will be lost.


1 Answers

Those warning messages can be suppressed by adding the following lines to /etc/spark/conf/log4j.properties:

log4j.logger.org.spark_project.jetty.server.HttpChannel=ERROR  log4j.logger.org.spark_project.jetty.servlet.ServletHandler=ERROR 

I didn't see any impact on performance nor on job stability. Now my logs are much readable :)

like image 51
Guy Cohen Avatar answered Oct 22 '22 05:10

Guy Cohen