Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark on K8s - getting error: kube mode not support referencing app depenpendcies in local

I am trying to setup a spark cluster on k8s. I've managed to create and setup a cluster with three nodes by following this article: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

After that when I tried to deploy spark on the cluster it failed at spark submit setup. I used this command:

~/opt/spark/spark-2.3.0-bin-hadoop2.7/bin/spark-submit \
--master k8s://https://206.189.126.172:6443 \
--deploy-mode cluster \
--name word-count \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=docker.io/garfiny/spark:v2.3.0 \
—-conf spark.kubernetes.driver.pod.name=word-count \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar

And it gives me this error:

Exception in thread "main" org.apache.spark.SparkException: The Kubernetes mode does not yet support referencing application dependencies in the local file system.
    at org.apache.spark.deploy.k8s.submit.DriverConfigOrchestrator.getAllConfigurationSteps(DriverConfigOrchestrator.scala:122)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:229)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:227)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:227)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:192)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

2018-06-04 10:58:24 INFO ShutdownHookManager:54 - Shutdown hook called 2018-06-04 10:58:24 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/lz/0bb8xlyd247cwc3kvh6pmrz00000gn/T/spark-3967f4ae-e8b3-428d-ba22-580fc9c840cd

Note: I followed this article for installing spark on k8s. https://spark.apache.org/docs/latest/running-on-kubernetes.html

like image 473
garfiny Avatar asked Jun 01 '18 06:06

garfiny


People also ask

What are the required additional considerations when deploying Spark applications on top Kubernetes using client mode?

You must have appropriate permissions to list, create, edit and delete pods in your cluster. You can verify that you can list these resources by running kubectl auth can-i <list|create|edit|delete> pods . The service account credentials used by the driver pods must be allowed to create pods, services and configmaps.

How do I run Spark application on Kubernetes?

Running spark-submit from within the cluster If you want to run spark-submit from within a pod, you'll have to grant the pod access to the k8s API. This is done by creating a Role with the permissions and attaching it to the pod through a service account: Save this as a yaml file, and apply it with kubectl apply -f.

Can you run Spark on Kubernetes?

Spark can run on clusters managed by Kubernetes. This feature makes use of native Kubernetes scheduler that has been added to Spark. The Kubernetes scheduler is currently experimental. In future versions, there may be behavioral changes around configuration, container images and entrypoints.

What is Spark Kubernetes upload path?

--conf spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp file:///full/path/to/app.jar. The app jar file will be uploaded to the S3 and then when the driver is launched it will be downloaded to the driver pod and will be added to its classpath.


2 Answers

The error message comes from commit 5d7c4ba4d73a72f26d591108db3c20b4a6c84f3f and include the page you mention: "Running Spark on Kubernetes" with the mention that you indicate:

// TODO(SPARK-23153): remove once submission client local dependencies are supported.
if (existSubmissionLocalFiles(sparkJars) || existSubmissionLocalFiles(sparkFiles)) {
  throw new SparkException("The Kubernetes mode does not yet support referencing application " +
    "dependencies in the local file system.")
}

This is described in SPARK-18278:

it wouldn't accept running a local: jar file, e.g. local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar, on my spark docker image (allowsMixedArguments and isAppResourceReq booleans in SparkSubmitCommandBuilder.java get in the way).

And this is linked to kubernetes issue 34377

The issue SPARK-22962 "Kubernetes app fails if local files are used" mentions:

This is the resource staging server use-case. We'll upstream this in the 2.4.0 timeframe.

In the meantime, that error message was introduced in PR 20320.

It includes the comment:

The manual tests I did actually use a main app jar located on gcs and http.
To be specific and for record, I did the following tests:

  • Using a gs:// main application jar and a http:// dependency jar. Succeeded.
  • Using a https:// main application jar and a http:// dependency jar. Succeeded.
  • Using a local:// main application jar. Succeeded.
  • Using a file:// main application jar. Failed.
  • Using a file:// dependency jar. Failed.

That issue should been fixed by now, and the OP garfiny confirms in the comments:

I used the newest spark-kubernetes jar to replace the one in spark-2.3.0-bin-hadoop2.7 package. The exception is gone.

like image 170
VonC Avatar answered Oct 18 '22 18:10

VonC


According to the mentioned documentation:

Dependency Management

If your application’s dependencies are all hosted in remote locations like HDFS or HTTP servers, they may be referred to by their appropriate remote URIs. Also, application dependencies can be pre-mounted into custom-built Docker images. Those dependencies can be added to the classpath by referencing them with local:// URIs and/or setting the SPARK_EXTRA_CLASSPATH environment variable in your Dockerfiles. The local:// scheme is also required when referring to dependencies in custom-built Docker images in spark-submit.

Note that using application dependencies from the submission client’s local file system is currently not yet supported.

like image 2
VASャ Avatar answered Oct 18 '22 17:10

VASャ