Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Streaming from Kafka Consumer

I might need to work with Kafka and I am absolutely new to it. I understand that there are Kafka producers which will publish the logs(called events or messages or records in Kafka) to the Kafka topics.

I will need to work on reading from Kafka topics via consumer. Do I need to set up consumer API first then I can stream using SparkStreaming Context(PySpark) or I can directly use KafkaUtils module to read from kafka topics?

In case I need to setup the Kafka consumer application, how do I do that? Please can you share links to right docs.

Thanks in Advance!!

like image 466
Puneet Tripathi Avatar asked Jul 01 '16 05:07

Puneet Tripathi


People also ask

Can Spark work with Kafka?

The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.

What is Spark streaming consumer?

Spark Streaming is part of the Apache Spark platform that enables scalable, high throughput, fault tolerant processing of data streams. Although written in Scala, Spark offers Java APIs to work with. Apache Cassandra is a distributed and wide-column NoSQL data store.

Is Spark streaming deprecated?

Now that the Direct API of Spark Streaming (we currently have version 2.3. 2) is deprecated and we recently added the Confluent platform (comes with Kafka 2.2. 0) to our project we plan to migrate these applications.

What is the primary difference between Kafka streams and Spark streaming?

Kafka analyses the events as they unfold. As a result, it employs a continuous (event-at-a-time) processing model. Spark, on the other hand, uses a micro-batch processing approach, which divides incoming streams into small batches for processing.


2 Answers

Spark provide internal kafka stream in which u dont need to create custom consumer there is 2 approach to connect with kafka 1 with receiver 2. direct approach. For more detail go through this link http://spark.apache.org/docs/latest/streaming-kafka-integration.html

like image 156
Sandeep Purohit Avatar answered Oct 07 '22 13:10

Sandeep Purohit


There's no need to set up kafka consumer application,Spark itself creates a consumer with 2 approaches. One is Reciever Based Approach which uses KafkaUtils class and other is Direct Approach which uses CreateDirectStream Method. Somehow, in any case of failure ion Spark streaming,there's no loss of data, it starts from the offset of data where you left.

For more details,use this link: http://spark.apache.org/docs/latest/streaming-kafka-integration.html

like image 29
Tanvi Garg Avatar answered Oct 07 '22 14:10

Tanvi Garg