Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix "Forbidden!Configured service account doesn't have access" with Spark on Kubernetes?

I am trying to run the basic example of submitting a spark application with a k8s cluster.

I created my docker image, using the script from the spark folder :

sudo ./bin/docker-image-tool.sh -mt spark-docker build

sudo docker image ls 

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
spark-r             spark-docker        793527583e00        17 minutes ago      740MB
spark-py            spark-docker        c984e15fe747        18 minutes ago      446MB
spark               spark-docker        71950de529b3        18 minutes ago      355MB
openjdk             8-alpine            88d1c219f815        15 hours ago        105MB
hello-world         latest              fce289e99eb9        3 months ago        1.84kB

And then tried to submit the SparkPi examples (as in the official documentation).

./bin/spark-submit \
        --master k8s://[MY_IP]:8443 \
        --deploy-mode cluster \
        --name spark-pi --class org.apache.spark.examples.SparkPi \
        --driver-memory 1g \
        --executor-memory 3g \
        --conf spark.executor.instances=2 \
        --conf spark.kubernetes.container.image=spark:spark-docker \
        local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar

But the run fail with the following Exception :

io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/spark-pi-1554304245069-driver. 
Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-1554304245069-driver" is forbidden: User "system:serviceaccount:default:default" cannot get resource "pods" in API group "" in the namespace "default".

Here are the full logs of the pod from the Kubernetes Dashboard :

2019-04-03 15:10:50 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@49096b06{/executors/threadDump,null,AVAILABLE,@Spark}
2019-04-03 15:10:50 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4a183d02{/executors/threadDump/json,null,AVAILABLE,@Spark}
2019-04-03 15:10:50 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5d05ef57{/static,null,AVAILABLE,@Spark}
2019-04-03 15:10:50 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@34237b90{/,null,AVAILABLE,@Spark}
2019-04-03 15:10:50 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1d01dfa5{/api,null,AVAILABLE,@Spark}
2019-04-03 15:10:50 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31ff1390{/jobs/job/kill,null,AVAILABLE,@Spark}
2019-04-03 15:10:50 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@759d81f3{/stages/stage/kill,null,AVAILABLE,@Spark}
2019-04-03 15:10:50 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://spark-pi-1554304245069-driver-svc.default.svc:4040
2019-04-03 15:10:50 INFO  SparkContext:54 - Added JAR file:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar at spark://spark-pi-1554304245069-driver-svc.default.svc:7078/jars/spark-examples_2.11-2.4.0.jar with timestamp 1554304250157
2019-04-03 15:10:51 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
    at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
    at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/spark-pi-1554304245069-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-1554304245069-driver" is forbidden: User "system:serviceaccount:default:default" cannot get resource "pods" in API group "" in the namespace "default".
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:470)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:407)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312) 

Notes :

  • Spark 2.4
  • Kubernetes 1.14.0
  • I use Minikube for my k8s cluster
like image 401
Nakeuh Avatar asked Apr 03 '19 15:04

Nakeuh


People also ask

What is Spark operator in Kubernetes?

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.


1 Answers

Hello I had the same issue. I then found this Github issue https://github.com/GoogleCloudPlatform/continuous-deployment-on-kubernetes/issues/113

That point me to the problem. I solved the issue following the Spark guide for RBAC cluster here https://github.com/GoogleCloudPlatform/continuous-deployment-on-kubernetes/issues/113

Create a serviceaccount

kubectl create serviceaccount spark

Give the service account the edit role on the cluster

kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default

Run spark submit with the following flag, in order to run it with the (just created(service account)

--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark

Hope it helps!

like image 182
Simone Bracaloni Avatar answered Sep 18 '22 20:09

Simone Bracaloni