How to create InputDStream with offsets in PySpark (using KafkaUtils.createDirectStream)?

1 Answers

If you want to create an RDD from records in a Kafka topic, use a static set of tuples.

Make available all the imports

from pyspark.streaming.kafka import KafkaUtils, OffsetRange

Then you create a dictionary of Kafka Brokers

kafkaParams = {"metadata.broker.list": "host1:9092,host2:9092,host3:9092"}

Then you create your offsets object

start = 0
until = 10
partition = 0
topic = 'topic'    
offset = OffsetRange(topic,partition,start,until)
offsets = [offset]

Finally you create the RDD:

kafkaRDD = KafkaUtils.createRDD(sc, kafkaParams,offsets)

To create Stream with offsets you need to do the following:

from pyspark.streaming.kafka import KafkaUtils, TopicAndPartition
from pyspark.streaming import StreamingContext

Then you create your sparkstreaming context using your sparkcontext

ssc = StreamingContext(sc, 1)

Next we set up all of our parameters

 kafkaParams = {"metadata.broker.list": "host1:9092,host2:9092,host3:9092"}
 start = 0
 partition = 0
 topic = 'topic'

Then we create our fromOffset Dictionary

topicPartion = TopicAndPartition(topic,partition)
fromOffset = {topicPartion: long(start)}
//notice that we must cast the int to long

Finally we create the Stream

directKafkaStream = KafkaUtils.createDirectStream(ssc, [topic],kafkaParams, 
fromOffsets=fromOffset)

110

answered Oct 04 '22 09:10

Phineas Dashevsky

Related questions
                            
                                Why does my Spark run slower than pure Python? Performance comparison
                            
                                How to define a global read\write variables in Spark
                            
                                Why do we need kafka to feed data to apache spark
                            
                                How to insert spark structured streaming DataFrame to Hive external table/location?
                            
                                Spark (Scala) filter array of structs without explode
                            
                                Pure Java/Scala code for writing Tensorflow TFRecords data file
                            
                                Spark: saveAsTextFile without compression
                            
                                Encode an ADT / sealed trait hierarchy into Spark DataSet column
                            
                                where does df.cache() is stored
                            
                                How to set up Spark with Zookeeper for HA?
                            
                                Error in running job on Spark 1.4.0 with Jackson module with ScalaObjectMapper
                            
                                Is reading a CSV file from S3 into a Spark dataframe expected to be so slow?
                            
                                How to set a custom environment variable in EMR to be available for a spark Application
                            
                                How to list all tables in database using Spark SQL?
                            
                                Spark Streaming: Micro batches Parallel Execution
                            
                                Spark Structured Streaming Checkpoint Cleanup
                            
                                Collect rows as list with group by apache spark
                            
                                How to query to mongo using spark?
                            
                                What is "Hadoop" - the definition of Hadoop?
                            
                                spark - filter within map

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create InputDStream with offsets in PySpark (using KafkaUtils.createDirectStream)?

Tags:

apache-kafka

apache-spark

pyspark

Phineas Dashevsky

People also ask

1 Answers

Phineas Dashevsky

Recent Activity

Donate For Us