Batch Processing on Kubernetes

Tags:

Anyone here have experience about batch processing (e.g. spring batch) on kubernetes ? Is it good idea ? How to prevent batch processing process same data if we use kubernetes auto scaling feature ? Thank you.

240

asked Mar 30 '20 04:03

Daniel Setiawan

1 Answers

Anyone here have experience about batch processing (e.g. spring batch) on kubernetes ? Is it good idea ?

For Spring Batch, we (the Spring Batch team) do have some experience on the matter which we share in the following talks:

Cloud Native Batch Processing on Kubernetes, by Michael Minella
Spring Batch on Kubernetes, by me.

Running batch jobs on kubernetes can be tricky:

pods may be re-scheduled by k8s on different nodes in the middle of processing
cron jobs might be triggered twice
etc

This requires additional non-trivial work on the developer's side to make sure the batch application is fault-tolerant (resilient to node failure, pod re-scheduling, etc) and safe against duplicate job execution in a clustered environment.

Spring Batch takes care of this additional work for you and can be a good choice to run batch workloads on k8s for several reasons:

Cost efficiency: Spring Batch jobs maintain their state in an external database, which makes it possible to restart them from the last save point in case of job/node failure or pod re-scheduling
Robustness: Safe against duplicate job executions thanks to a centralized job repository
Fault-tolerance: Retry/Skip failed items in case of transient errors like a call to a web service that might be temporarily down or being re-scheduled in a cloud environment

I wrote a blog post in which I explain all these aspects in details with code examples. You can find it here: Spring Batch on Kubernetes: Efficient batch processing at scale

How to prevent batch processing process same data if we use kubernetes auto scaling feature ?

Making each job process a different data set is the way to go (a job per file for example). But there are different patterns that you might be interested in, see Job Patterns from k8s docs.

168

answered Oct 27 '22 02:10

Mahmoud Ben Hassine

Related questions
                            
                                Shell access to Persistent Volume in Google Cloud
                            
                                kubernetes error: unable to recognize "deployment.yaml": no matches for extensions/, Kind=Deployment
                            
                                What Azure Kubernetes (AKS) 'Time-out' happens to disconnect connections in/out of a Pod in my Cluster?
                            
                                How to patch a deployed Ingress resource on Kubernetes?
                            
                                Override config map file in helm
                            
                                How to remove broken nodes in Kubernetes
                            
                                Can someone please explain to me when I would use the "App Root" annotation in Kubernetes
                            
                                How to fix "Document mapping type name can't start with '_', found: [_create]_" elasticsearch?
                            
                                How to exclude namespace from fluent-bit logging
                            
                                Changing image of kubernetes job
                            
                                Traefik Dashboard: Ingress and IngressRoute, can they co-exist?
                            
                                How to select a specific pod for a service in Kubernetes
                            
                                Dev machine as part of Minikube's network?
                            
                                How can I use curl to access the Kubernetes API from within a pod?
                            
                                Helm install, Kubernetes - how to wait for the pods to be ready?
                            
                                How can I copy files between pods or between nodes in the kubernetes cluster?
                            
                                How best to have files on volumes in Kubernetes using helm charts?
                            
                                Clarify Ingress load balancer
                            
                                Kubernetes eviction manager evicting control plane pods to reclaim ephemeral storage
                            
                                AWS ALB Ingress Controller doesn't resolve over TLS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Batch Processing on Kubernetes

Tags:

kubernetes

batch-processing

spring-batch

Daniel Setiawan

People also ask

1 Answers

Mahmoud Ben Hassine

Recent Activity

Donate For Us