How Spark Structured Streaming handles backpressure?

2 Answers

If you mean dynamically changing the size of each internal batch in Structured Streaming, then NO. There are not receiver-based sources in Structured Streaming, so that's totally not necessary. From another point of view, Structured Streaming cannot do real backpressure, because, such as, Spark cannot tell other applications to slow down the speed of pushing data into Kafka.

Generally, Structured Streaming will try to process data as fast as possible by default. There are options in each source to allow to control the processing rate, such as maxFilesPerTrigger in File source, and maxOffsetsPerTrigger in Kafka source. Read the following links for more details:

http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html

144

answered Sep 25 '22 21:09

zsxwing

Handling back pressure is needed only is push based mechanisms. Kafka consumers are pull based, spark will pull next batch of records only when current batch is finished processing and saving. If processing & saving is delayed in spark, it won't pull new batch of records so no need of back pressure handling.

maxOffsetsPerTrigger can change the number of records processed per spark batch set, backpressure.enabled changes rate of receiving, but that's not same as back pressure where you go and tell the source to slow dow.

answered Sep 23 '22 21:09

spats

Related questions
                            
                                How to create a Spark UDF in Java / Kotlin which returns a complex type?
                            
                                How to do conditional "withColumn" in a Spark dataframe?
                            
                                Updating column value in loop in spark
                            
                                If data fits on a single machine does it make sense to use Spark?
                            
                                Apache Spark - working with 2 RDDs: complement of RDDs
                            
                                Spark toDebugString not nice in python
                            
                                Why Hadoop or Spark? There is ElasticSearch
                            
                                Submit & Kill Spark Application program programmatically from another application
                            
                                Access key from mapValues or flatMapValues?
                            
                                How to execute .sql file in spark using python
                            
                                Duplicate columns in Spark Dataframe
                            
                                How can I return an empty (null?) item back from a map method in PySpark?
                            
                                how to get the column names and their datatypes of parquet file using pyspark?
                            
                                Spark not using spark.sql.parquet.compression.codec
                            
                                Set driver's memory size programmatically in PySpark
                            
                                Write spark dataframe to postgres Database
                            
                                Pyspark RDD .filter() with wildcard
                            
                                Read from BigQuery into Spark in efficient way?
                            
                                Can I read multiple files into a Spark Dataframe from S3, passing over nonexistent ones?
                            
                                How to concatenate multiple columns into single column (with no prior knowledge on their number)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How Spark Structured Streaming handles backpressure?

Tags:

apache-spark

backpressure

spark-structured-streaming

Aniello Guarino

People also ask

2 Answers

zsxwing

spats

Recent Activity

Donate For Us