I'm building a web application which provides as a core feature the ability for users to upload large images and have them processed. The processing takes roughly 3 minutes to complete, and I thought Heroku would be an ideal platform for being able to run these processing jobs on-demand, and in a highly scalable way. The processing task itself is fairly computationally expensive, and needs to run a the high-end PX dyno. I want to maximize parallelization, and minimize (effectively eliminate) the time a job spends waiting in a queue. In other words, I want to have N PX dynos for N jobs.
Thankfully, I can accomplish this pretty easily with Heroku's API (or optionally a service like Hirefire). Whenever a new processing request comes in, I can simply increment the worker count and the new worker will grab the job from the queue and start processing immediately.
However, while scaling up is painless, scaling down is where the trouble starts. The Heroku API is frustratingly limited. I can only set the number of running workers, not specifically kill idle ones. This means that if I have 20 workers each processing an image, and one completes its task, I cannot safely scale the worker count to 19, because Heroku will kill an arbitrary worker dyno, regardless of whether it's actually in the midst of a job! Leaving all workers running until all jobs complete is simply out of the question, because the cost would be astronomical. Imagine 100 workers created during a spike continue to idle indefinitely as a few new jobs trickle in throughout the day!
I've scoured the web, and the best "solution" that people suggest is to have your worker process gracefully handle termination. Well that's perfectly fine if your worker is just doing mass-emailing, but my workers are doing some very drawn-out analytics on images, and as I mentioned above, take about 3 minutes to complete.
In an ideal world, I could kill a specific worker dyno upon completion of its task. This would make scaling down just as easy as scaling up.
In fact, I've come close to that ideal world by switching from worker dynos to one-off dynos (which terminate upon process termination, i.e. you stop paying for the dyno after it's "root program" exits). However, Heroku sets a hard limit of 5 one-off dynos that can be run simultaneously. This I can understand, as I was certainly in a sense abusing one-off dynos...but it is quite frustrating nonetheless.
Is there any way I can better scale down my workers? I would prefer not to have to radically re-engineer my processing algorithm...splitting it up into a few chunks which run in 30-40 seconds as opposed to one 3 minute stretch (that way accidentally killing a running worker wouldn't be catastrophic). That approach would drastically complicate my processing code and introduce several new points of failure. However, if it's my only option, I'll have to do it.
Any ideas or thoughts are appreciated!
From the Heroku Dashboard, select the app you want to scale from your apps list. Navigate to the Resources tab. Above your list of dynos, click Change Dyno Type . Select the Professional (Standard/Performance) dyno type.
ps is a command which prefixes many commands affecting dynos (~virtual machine instances); I'm assuming it's related to the linux ps command, which stands for "process status." ps:scale is used to increase the number of dynos running a process. ps:scale web=1 specifies to run the process on 1 web dyno.
Free dyno hour pool This means you can receive a total of 1000 free dyno hours per month, if you verify your account with a credit card. Used free dyno hours are non-transferable.
Both Heroku and AWS have auto-scaling solutions, but whereas Heroku has a fairly flat learning curve -- that is what you are paying for -- AWS can get broad and steep fairly quickly. A Udemy AWS course or any of the hundred other online resources will get your started down building a robust AWS architecture.
This is what Heroku's support answered about this:
I'm afraid this isn't possible at the moment. When scaling down your workers, we will stop the one with the highest number, so we don't have to change the public name for those dynos, and you don't get numbering holes.
I found this comment interesting in this context, although it did not really solve this issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With