Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Long running EMR cluster vs new cluster for each occurrence

I have a use-case to run Spark job periodically (say, 30 minutes) on a EMR cluster. What are the factors to decide whether to have a new cluster for every run or use a long running cluster?

What are possible strategies for scaling up the cluster if we decide on a long running cluster?

like image 696
Abhay Dubey Avatar asked Oct 12 '25 09:10

Abhay Dubey


1 Answers

I generally prefer independent clusters because it makes it easier to debug and spawn off test runs when needed. But, you would want to do the math of how much it would cost you in both scenarios. Adding more nodes later to an existing cluster is easy, so I wouldn't worry about that.

Things to know:

  • You will pay rounded to the nearest minute
  • EMR clusters take about 10 minutes to start, which is time you are paying for

The things you would want to consider:

  • How long does your job actually take to run.
  • Is a 10 minute delay to starting your job acceptable?
  • If your job is < 20 minutes: It will be cheaper to do independent clusters
  • If your job is > 30 minutes: On a persistent cluster your next half hour job would have to wait
  • Do you want isolation of your runs? If you run separate clusters, when you are reading logs to debug you won't have to worry about filtering out different jobs
  • If you use a persistent cluster, you can manually setup any extra dependencies since you are only going to do it once. On new clusters you would want to script it.

The cost will based on what EC2 instance type you select for your cluster and how many nodes you decide to have. An easy way to compute the estimates is to use AWS's cost calculator:

https://calculator.s3.amazonaws.com/index.html

For your case, it depends on how long your spark job takes to run. You pay for the cluster on one minute increments so if your job only takes a few minutes to run then it will be cheaper create a new cluster each time. The other thing to remember is it usually takes around 10 minutes or an EMR cluster to start, which is time you are paying for so even if your job only takes 5 minutes you would pay for

like image 104
Ryan Widmaier Avatar answered Oct 15 '25 17:10

Ryan Widmaier



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!