I am trying to setup a spark cluster on k8s. I've managed to create and setup a cluster with three nodes by following this article: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
After that when I tried to deploy spark on the cluster it failed at spark submit setup. I used this command:
~/opt/spark/spark-2.3.0-bin-hadoop2.7/bin/spark-submit \
--master k8s://https://206.189.126.172:6443 \
--deploy-mode cluster \
--name word-count \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=docker.io/garfiny/spark:v2.3.0 \
—-conf spark.kubernetes.driver.pod.name=word-count \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
And it gives me this error:
Exception in thread "main" org.apache.spark.SparkException: The Kubernetes mode does not yet support referencing application dependencies in the local file system.
at org.apache.spark.deploy.k8s.submit.DriverConfigOrchestrator.getAllConfigurationSteps(DriverConfigOrchestrator.scala:122)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:229)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:227)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:227)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:192)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-06-04 10:58:24 INFO ShutdownHookManager:54 - Shutdown hook called 2018-06-04 10:58:24 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/lz/0bb8xlyd247cwc3kvh6pmrz00000gn/T/spark-3967f4ae-e8b3-428d-ba22-580fc9c840cd
Note: I followed this article for installing spark on k8s. https://spark.apache.org/docs/latest/running-on-kubernetes.html
You must have appropriate permissions to list, create, edit and delete pods in your cluster. You can verify that you can list these resources by running kubectl auth can-i <list|create|edit|delete> pods . The service account credentials used by the driver pods must be allowed to create pods, services and configmaps.
Running spark-submit from within the cluster If you want to run spark-submit from within a pod, you'll have to grant the pod access to the k8s API. This is done by creating a Role with the permissions and attaching it to the pod through a service account: Save this as a yaml file, and apply it with kubectl apply -f.
Spark can run on clusters managed by Kubernetes. This feature makes use of native Kubernetes scheduler that has been added to Spark. The Kubernetes scheduler is currently experimental. In future versions, there may be behavioral changes around configuration, container images and entrypoints.
--conf spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp file:///full/path/to/app.jar. The app jar file will be uploaded to the S3 and then when the driver is launched it will be downloaded to the driver pod and will be added to its classpath.
The error message comes from commit 5d7c4ba4d73a72f26d591108db3c20b4a6c84f3f and include the page you mention: "Running Spark on Kubernetes" with the mention that you indicate:
// TODO(SPARK-23153): remove once submission client local dependencies are supported.
if (existSubmissionLocalFiles(sparkJars) || existSubmissionLocalFiles(sparkFiles)) {
throw new SparkException("The Kubernetes mode does not yet support referencing application " +
"dependencies in the local file system.")
}
This is described in SPARK-18278:
it wouldn't accept running a local: jar file, e.g.
local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar
, on my spark docker image (allowsMixedArguments
andisAppResourceReq booleans
inSparkSubmitCommandBuilder.java
get in the way).
And this is linked to kubernetes issue 34377
The issue SPARK-22962 "Kubernetes app fails if local files are used" mentions:
This is the resource staging server use-case. We'll upstream this in the 2.4.0 timeframe.
In the meantime, that error message was introduced in PR 20320.
It includes the comment:
The manual tests I did actually use a main app jar located on gcs and http.
To be specific and for record, I did the following tests:
- Using a gs:// main application jar and a http:// dependency jar. Succeeded.
- Using a https:// main application jar and a http:// dependency jar. Succeeded.
- Using a local:// main application jar. Succeeded.
- Using a file:// main application jar. Failed.
- Using a file:// dependency jar. Failed.
That issue should been fixed by now, and the OP garfiny confirms in the comments:
I used the newest
spark-kubernetes jar
to replace the one inspark-2.3.0-bin-hadoop2.7
package. The exception is gone.
According to the mentioned documentation:
Dependency Management
If your application’s dependencies are all hosted in remote locations like HDFS or HTTP servers, they may be referred to by their appropriate remote URIs. Also, application dependencies can be pre-mounted into custom-built Docker images. Those dependencies can be added to the classpath by referencing them with local:// URIs and/or setting the SPARK_EXTRA_CLASSPATH environment variable in your Dockerfiles. The local:// scheme is also required when referring to dependencies in custom-built Docker images in spark-submit.
Note that using application dependencies from the submission client’s local file system is currently not yet supported.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With