Spark Streaming Kafka stream

Tags:

I'm having some issues while trying to read from kafka with spark streaming.

My code is:

val sparkConf = new SparkConf().setMaster("local[2]").setAppName("KafkaIngestor")
val ssc = new StreamingContext(sparkConf, Seconds(2))

val kafkaParams = Map[String, String](
  "zookeeper.connect" -> "localhost:2181",
  "group.id" -> "consumergroup",
  "metadata.broker.list" -> "localhost:9092",
  "zookeeper.connection.timeout.ms" -> "10000"
  //"kafka.auto.offset.reset" -> "smallest"
)

val topics = Set("test")
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics)

I previously started zookeeper at port 2181 and Kafka server 0.9.0.0 at port 9092. But I get the following error in the Spark driver:

Exception in thread "main" java.lang.ClassCastException: kafka.cluster.BrokerEndPoint cannot be cast to kafka.cluster.Broker
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6$$anonfun$apply$7.apply(KafkaCluster.scala:90)
at scala.Option.map(Option.scala:145)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6.apply(KafkaCluster.scala:90)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6.apply(KafkaCluster.scala:87)

Zookeeper log:

[2015-12-08 00:32:08,226] INFO Got user-level KeeperException when processing sessionid:0x1517ec89dfd0000 type:create cxid:0x34 zxid:0x1d3 txntype:-1 reqpath:n/a Error Path:/brokers/ids Error:KeeperErrorCode = NodeExists for /brokers/ids (org.apache.zookeeper.server.PrepRequestProcessor)

Any hint?

Thank you very much

548

asked Dec 07 '15 23:12

besil

1 Answers

The problem was related the wrong spark-streaming-kafka version.

As described in the documentation

Kafka: Spark Streaming 1.5.2 is compatible with Kafka 0.8.2.1

So, including

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka_2.10</artifactId>
    <version>0.8.2.2</version>
</dependency>

in my pom.xml (instead of version 0.9.0.0) solved the issue.

Hope this helps

133

answered Sep 28 '22 08:09

besil

Related questions
                            
                                Pyspark 'NoneType' object has no attribute '_jvm' error
                            
                                DataFrame object has no attribute 'col'
                            
                                Pandas scalar UDF failing, IllegalArgumentException
                            
                                Storing a Graph in Spark Graphx with HDFS
                            
                                Apache Spark Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
                            
                                How can I change spark ui port?
                            
                                Spark ALS predictAll returns empty
                            
                                withColumn not allowing me to use max() function to generate a new column
                            
                                how to join two DataFrame and replace one column conditionally in spark
                            
                                How to append to a csv file using df.write.csv in pyspark?
                            
                                Spark SQL statement broadcast
                            
                                IF Statement Pyspark
                            
                                Configure standalone spark for azure storage access
                            
                                Scala Spark - illegal start of definition
                            
                                Difference in usecases for AWS Sagemaker vs Databricks?
                            
                                Why does a PySpark UDF that operates on a column generated by rand() fail?
                            
                                Spark does't run in Windows anymore
                            
                                Calling JDBC to impala/hive from within a spark job and creating a table
                            
                                Spark Cassandra connector - Range query on partition key
                            
                                NumPy exception when using MLlib even though Numpy is installed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark Streaming Kafka stream

Tags:

apache-kafka

apache-spark

spark-streaming

spark-streaming-kafka

besil

People also ask

1 Answers

besil

Recent Activity

Donate For Us