Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I cannot use --package option on bitnami/spark docker container

I pulled docker image and executed below command to run image.

  1. docker run -it bitnami/spark:latest /bin/bash

  2. spark-shell --packages="org.elasticsearch:elasticsearch-spark-20_2.11:7.5.0"

and i got message like below

Ivy Default Cache set to: /opt/bitnami/spark/.ivy2/cache
The jars for the packages stored in: /opt/bitnami/spark/.ivy2/jars
:: loading settings :: url = jar:file:/opt/bitnami/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.elasticsearch#elasticsearch-spark-20_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-c785f3e6-7c78-469f-ab46-451f8be61a4c;1.0
        confs: [default]
Exception in thread "main" java.io.FileNotFoundException: /opt/bitnami/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-c785f3e6-7c78-469f-ab46-451f8be61a4c-1.0.xml (No such file or directory)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
        at org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:70)
        at org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:62)
        at org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.toIvyFile(DefaultModuleDescriptor.java:563)
        at org.apache.ivy.core.cache.DefaultResolutionCacheManager.saveResolvedModuleDescriptor(DefaultResolutionCacheManager.java:176)
        at org.apache.ivy.core.resolve.ResolveEngine.resolve(ResolveEngine.java:245)
        at org.apache.ivy.Ivy.resolve(Ivy.java:523)
        at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1300)
        at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:304)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I tried other package, but it is not working with all same error message.

Can you give some advice to avoid this error?

like image 368
sun lim Avatar asked Mar 11 '20 07:03

sun lim


People also ask

Can I run Spark on a docker container?

We have now created a fully distributed Spark cluster running inside of Docker containers and submitted an application to the cluster. The architecture we have just created looks like the following Each Spark worker node and the master node is running inside a Docker container located on its own computing instance.

How do I test a Spark-Master and spark-worker in Docker?

5. Attach to the spark-master container and test it’s communication to the spark-worker container using both it’s IP address and then using its container name Because the containers have been deployed into the same Docker bridge network they are able to resolve the IP address of other containers using the container’s name.

How do I set up a cluster in Docker?

A much more practical and elegant way of setting up a cluster is by taking advantage of Docker compose For those of you new to Docker compose, it allows you to launch what are called “ services ”. A service is made up of a single Docker image, but you may want multiple containers of this image to be running.

How do Docker containers communicate with each other?

All the Docker daemons are connected by means of an overlay network with the Spark master node being the Docker swarm manager in this case. Within the overlay network, containers can easily resolve each other’s addresses by referencing container names which utilises automatic service discovery.


2 Answers

Found the solution to it as given in https://github.com/bitnami/bitnami-docker-spark/issues/7 what we have to do is create a volume on host mapped to docker path

volumes:
  - ./jars_dir:/opt/bitnami/spark/ivy:z

give this path as cache path like this

spark-shell --conf spark.jars.ivy=/opt/bitnami/spark/ivy --conf spark.cassandra.connection.host=127.0.0.1 --packages com.datastax.spark:spark-cassandra-connector_2.12:3.0.0-beta --conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions

All happened because /opt/bitnami/spark is not writable and we have to mount a volume to bypass that.

like image 143
palash kulshreshtha Avatar answered Oct 19 '22 09:10

palash kulshreshtha


The error "java.io.FileNotFoundException: /opt/bitnami/spark/.ivy2/" occured because the location /opt/bitnami/spark/ is not writable. so in order to resolve this issue do modify the master spark service like this. Added user as root and add mounted volume path for required jars.

see the working block of spark service written in docker compose:

spark:
image: docker.io/bitnami/spark:3
container_name: spark
environment:
  - SPARK_MODE=master
  - SPARK_RPC_AUTHENTICATION_ENABLED=no
  - SPARK_RPC_ENCRYPTION_ENABLED=no
  - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
  - SPARK_SSL_ENABLED=no
user: root
ports:
  - '8880:8080'
volumes:
  - ./spark-defaults.conf:/opt/bitnami/spark/conf/spark-defaults.conf
  - ./jars_dir:/opt/bitnami/spark/ivy:z
like image 26
krishna kumar mishra Avatar answered Oct 19 '22 08:10

krishna kumar mishra