Spark Streaming from Kafka Consumer

Tags:

I might need to work with Kafka and I am absolutely new to it. I understand that there are Kafka producers which will publish the logs(called events or messages or records in Kafka) to the Kafka topics.

I will need to work on reading from Kafka topics via consumer. Do I need to set up consumer API first then I can stream using SparkStreaming Context(PySpark) or I can directly use KafkaUtils module to read from kafka topics?

In case I need to setup the Kafka consumer application, how do I do that? Please can you share links to right docs.

Thanks in Advance!!

466

asked Jul 01 '16 05:07

Puneet Tripathi

2 Answers

Spark provide internal kafka stream in which u dont need to create custom consumer there is 2 approach to connect with kafka 1 with receiver 2. direct approach. For more detail go through this link http://spark.apache.org/docs/latest/streaming-kafka-integration.html

156

answered Oct 07 '22 13:10

Sandeep Purohit

There's no need to set up kafka consumer application,Spark itself creates a consumer with 2 approaches. One is Reciever Based Approach which uses KafkaUtils class and other is Direct Approach which uses CreateDirectStream Method. Somehow, in any case of failure ion Spark streaming,there's no loss of data, it starts from the offset of data where you left.

For more details,use this link: http://spark.apache.org/docs/latest/streaming-kafka-integration.html

answered Oct 07 '22 14:10

Tanvi Garg

Related questions
                            
                                Regrouping / Concatenating DataFrame rows in Spark
                            
                                A quick guide on Salt-based install of Spark cluster
                            
                                What are the pros and cons of using broadcast variables in a singleton?
                            
                                Spark: why tasks assigned only to one worker?
                            
                                Spark-HBASE Error java.lang.IllegalStateException: unread block data
                            
                                How to add a typesafe config file which is located on HDFS to spark-submit (cluster-mode)?
                            
                                Is it possible to run spark yarn cluster from the code?
                            
                                Persisting data to DynamoDB using Apache Spark
                            
                                Merge multiple RDD generated in loop
                            
                                Spark not leveraging hdfs partitioning with parquet
                            
                                Efficiency of flatMap vs map followed by reduce in Spark
                            
                                How access individual element in a tuple on a RDD in pyspark?
                            
                                Can a model be created on Spark batch and use it in Spark streaming?
                            
                                How to save RandomForestClassifier Spark model in scala?
                            
                                How can I declare a Column as a categorical feature in a DataFrame for use in ml
                            
                                Passing Python functions as objects to Spark
                            
                                How to run spark shell with *local* packages?
                            
                                Spark shows different number of cores than what is passed to it using spark-submit
                            
                                Convert GraphFrames ShortestPath Map into DataFrame rows in PySpark
                            
                                'Symbol lookup error' with netlib-java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark Streaming from Kafka Consumer

Tags:

apache-kafka

apache-spark

kafka-consumer-api

pyspark

spark-streaming

Puneet Tripathi

People also ask

2 Answers

Sandeep Purohit

Tanvi Garg

Recent Activity

Donate For Us