I have a Kafka
based Spark Streaming
application that runs every 5 mins. Looking at the statistics after 5 days of run, there are a few observations:
The Processing time
gradually increases from 30 secs to 50 secs. The snapshot is shown below which highlights the processing time chart:
A good number of Garbage collection
logs are appearing as shown below:
Questions:
Processing Time
has increased substantially, even when number of events are more or less same (during the last trough) ?GC logs
at the end of each processing cycle. It is normal? processing time
to remain with in acceptable delays?It really depends on the application. The way I'd approach when debugging this issue is the following:
spark.cleaner.ttl
, but better make sure you uncache all the resources when they are not needed anymore.spark.streaming.ui.retainedBatches
parameter).I've had relatively complex Spark Streaming applications (Spark v1.6, v2.1.1, v2.2.0) running for days without any degradation in performance, so there must be some solvable issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With