I have a really simple PySpark script that creates a dataframe from some parquet data on S3 and then call count()
method and print out the number of records.
I run the script on AWS EMR cluster and I'm seeing following strange WARN information:
17/12/04 14:20:26 WARN ServletHandler: javax.servlet.ServletException: java.util.NoSuchElementException: None.get at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845) at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689) at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164) at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676) at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:461) at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.spark_project.jetty.server.Server.handle(Server.java:524) at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319) at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253) at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:347) at scala.None$.get(Option.scala:345) at org.apache.spark.status.api.v1.MetricHelper.submetricQuantiles(AllStagesResource.scala:313) at org.apache.spark.status.api.v1.AllStagesResource$$anon$1.build(AllStagesResource.scala:178) at org.apache.spark.status.api.v1.AllStagesResource$.taskMetricDistributions(AllStagesResource.scala:181) at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:71) at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:62) at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:130) at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:126) at org.apache.spark.status.api.v1.OneStageResource.withStage(OneStageResource.scala:97) at org.apache.spark.status.api.v1.OneStageResource.withStageAttempt(OneStageResource.scala:126) at org.apache.spark.status.api.v1.OneStageResource.taskSummary(OneStageResource.scala:62) at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) ... 28 more 17/12/04 14:20:26 WARN HttpChannel: //ip-172-31-81-10.ec2.internal:4040/api/v1/applications/application_1512395256824_0002/stages/3/0/taskSummary?proxyapproved=true javax.servlet.ServletException: java.util.NoSuchElementException: None.get at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845) at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689) at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164) at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676) at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:461) at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.spark_project.jetty.server.Server.handle(Server.java:524) at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319) at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253) at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:347) at scala.None$.get(Option.scala:345) at org.apache.spark.status.api.v1.MetricHelper.submetricQuantiles(AllStagesResource.scala:313) at org.apache.spark.status.api.v1.AllStagesResource$$anon$1.build(AllStagesResource.scala:178) at org.apache.spark.status.api.v1.AllStagesResource$.taskMetricDistributions(AllStagesResource.scala:181) at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:71) at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:62) at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:130) at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:126) at org.apache.spark.status.api.v1.OneStageResource.withStage(OneStageResource.scala:97) at org.apache.spark.status.api.v1.OneStageResource.withStageAttempt(OneStageResource.scala:126) at org.apache.spark.status.api.v1.OneStageResource.taskSummary(OneStageResource.scala:62) at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
It seems like it does not fail the job, though. I got the count returned successfully.
Just wonder if anyone know why this happens and how to get rid of it.
Thanks
Now you should be ready to debug. Simply start spark with the above command, then select the IntelliJ run configuration you just created and click Debug. IntelliJ should connect to your Spark application, which should now start running. You can set break points, inspect variables, etc.
In Spark, stage failures happen when there's a problem with processing a Spark task. These failures can be caused by hardware issues, incorrect Spark configurations, or code problems. When a stage failure occurs, the Spark driver logs report an exception similar to the following: org. apache.
You can resolve it by setting the partition size: increase the value of spark. sql. shuffle. partitions.
Failure of worker node – The node which runs the application code on the Spark cluster is Spark worker node. These are the slave nodes. Any of the worker nodes running executor can fail, thus resulting in loss of in-memory If any receivers were running on failed nodes, then their buffer data will be lost.
Those warning messages can be suppressed by adding the following lines to /etc/spark/conf/log4j.properties
:
log4j.logger.org.spark_project.jetty.server.HttpChannel=ERROR log4j.logger.org.spark_project.jetty.servlet.ServletHandler=ERROR
I didn't see any impact on performance nor on job stability. Now my logs are much readable :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With