Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deploy Spark application jar file to Kubernetes cluster?

I am currently trying to deploy a spark example jar on a Kubernetes cluster running on IBM Cloud.

If I try to follow these instructions to deploy spark on a kubernetes cluster, I am not able to launch Spark Pi, because I am always getting the error message:

The system cannot find the file specified

after entering the code

bin/spark-submit \
    --master k8s://<url of my kubernetes cluster> \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.container.image=<spark-image> \
    local:///examples/jars/spark-examples_2.11-2.3.0.jar

I am in the right directory with the spark-examples_2.11-2.3.0.jar file in the examples/jars directory.

like image 521
Pascal Avatar asked May 14 '18 19:05

Pascal


People also ask

How do I deploy Spark on a Kubernetes cluster?

This post details how to deploy Spark on a Kubernetes cluster. Minikube is a tool used to run a single-node Kubernetes cluster locally. Follow the official Install Minikube guide to install it along with a Hypervisor (like VirtualBox or HyperKit ), to manage virtual machines, and Kubectl, to deploy and manage apps on Kubernetes.

How do I submit an application using spark-submit in Kubernetes?

Run this command on the directory you have created the code. For the spark-submit to reach the cluster we will activate Kubernetes proxy so that the cluster starts listening to the communications on its IP. Finally, we will submit the application using spark-submit from the master node.

How do I deploy Apache Spark application?

Apache Spark - Deployment. Spark application, using spark-submit, is a shell command used to deploy the Spark application on a cluster. It uses all respective cluster managers through a uniform interface. Therefore, you do not have to configure your application for each one.

Do I need K8s to start a Kubernetes pod?

Yes, the somewhat weird k8s:// prefix is needed. define name for Spark Driver (that’s also what your pod’s name will start with) run the Spark Executor with 2 replicas on Kubernetes that will be spawned by your Spark Driver define the Kubernetes image pull policy to Never, so the local image with that name can be used.


2 Answers

Ensure your.jar file is present inside the container image.

Instruction tells that it should be there:

Finally, notice that in the above example we specify a jar with a specific URI with a scheme of local://. This URI is the location of the example jar that is already in the Docker image.

In other words, local:// scheme is removed from local:///examples/jars/spark-examples_2.11-2.3.0.jar and the path /examples/jars/spark-examples_2.11-2.3.0.jar is expected to be available in a container image.

like image 157
VASャ Avatar answered Nov 14 '22 03:11

VASャ


Please make sure this absolute path /examples/jars/spark-examples_2.11-2.3.0.jar is exists.

Or you are trying loading a jar file in current directory, In this case it should be an relative path like local://./examples/jars/spark-examples_2.11-2.3.0.jar.

I'm not sure if spark-submit accepts relative path or not.

like image 22
silverfox Avatar answered Nov 14 '22 05:11

silverfox