Spark unable to download kafka library

Tags:

apache-kafka

apache-spark

I am using Python 3.5 and Spark 2.2 Streaming with Kafka and the script was unable to run due to missing kafka libraries.

I am puzzled why the library was missing/not found even though the dependency information was from Spark's website itself.

groupId = org.apache.spark
artifactId = spark-streaming-kafka-0-10_2.11
version = 2.2.0

I ran "spark-submit script.py" and the error shows that kafka library is required.

Spark Streaming's Kafka libraries not found in class path. Try one of the following.

  1. Include the Kafka library and its dependencies with in the
     spark-submit command as

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.2.0 ...

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.2.0.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...

On the next run, I ran "spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10:2.2.0 script.py" with the kafka library to be downloaded.

This time round the error shows that it is not able to find/download the library.

Ivy Default Cache set to: C:\Users\james\.ivy2\cache
The jars for the packages stored in: C:\Users\james\.ivy2\jars
:: loading settings :: url = jar:file:/D:/programs/spark-2.2.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-streaming-kafka-0-10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
:: resolution report :: resolve 2908ms :: artifacts dl 0ms
        :: modules in use:
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
        ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
                module not found: org.apache.spark#spark-streaming-kafka-0-10;2.2.0

        ==== local-m2-cache: tried

          file:/C:/Users/james/.m2/repository/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.pom

          -- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:

          file:/C:/Users/james/.m2/repository/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.jar

        ==== local-ivy-cache: tried

          C:\Users\james\.ivy2\local\org.apache.spark\spark-streaming-kafka-0-10\2.2.0\ivys\ivy.xml

          -- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:

          C:\Users\james\.ivy2\local\org.apache.spark\spark-streaming-kafka-0-10\2.2.0\jars\spark-streaming-kafka-0-10.jar

        ==== central: tried

          https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.pom

          -- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:

          https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.jar

        ==== spark-packages: tried

          http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.pom

          -- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:

          http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.jar

                ::::::::::::::::::::::::::::::::::::::::::::::

                ::          UNRESOLVED DEPENDENCIES         ::

                ::::::::::::::::::::::::::::::::::::::::::::::

                :: org.apache.spark#spark-streaming-kafka-0-10;2.2.0: not found

                ::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.spark#spark-streaming-kafka-0-10;2.2.0: not found]
        at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1177)
        at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:298)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

315

asked Aug 27 '18 13:08

ilovetolearn

2 Answers

First: As discussed on Developers Mailing list, Kafka is not included in binary distribution. That is why you don't have it on classpath.

Second: in your --packages command, you should specify Scala version. It's not necessary only in SBT, but spark-submit uses Ivy in the background.

So, please try:

  $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0 script.py

Extra point: Maybe I will create a PR to change description, it's misleading

answered Sep 21 '22 11:09

T. Gawęda

Try to write

bin/spark-submit --jars yourjarfile.jar --packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.4.3 pythoncode.py

I had the same problem and I solved it typing like this. I hope that helps.

answered Sep 19 '22 11:09

Nijat Mursali

Related questions
                            
                                How to load only the data of the last partition
                            
                                Find median in spark SQL for multiple double datatype columns
                            
                                Apache spark case with multiple when clauses on different columns
                            
                                Spark union fails with nested JSON dataframe
                            
                                How to load a csv directly into a Spark Dataset?
                            
                                How to Test Spark RDD
                            
                                merge two dataset which are having different column names in Apache spark
                            
                                Why does spark-shell fail with "The root scratch dir: /tmp/hive on HDFS should be writable."?
                            
                                Why does a query fail with "AnalysisException: Expected only partition pruning predicates"?
                            
                                Apache Spark standalone for Anonymous UID (Without user name)
                            
                                How do Spark Nodes communicate during a Shuffle?
                            
                                What type should it be , after using .toArray() for a Spark vector?
                            
                                Self-join not working as expected with the DataFrame API
                            
                                Apply a transformation to multiple columns pyspark dataframe
                            
                                what is the relationship between spark executor and yarn container when using spark on yarn
                            
                                Is it possible to ignore null values when using lead window function in Spark
                            
                                Does the SparkSQL Dataframe function explode preserve order?
                            
                                How to sort array of struct type in Spark DataFrame by particular column?
                            
                                Add UUID to spark dataset [duplicate]
                            
                                Why filter does not preserve partitioning?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With