Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Heroku queue times

I did a load test of my Rails application yesterday, running 8 dynos with 3 concurrent Unicorn processes on each. This is the New Relic output:

newrelic

As you can see, my Rails stack itself has a pretty good response time (DB, Web, etc), but the queue time is super terrible.

What can I do about this? Is this inherent in Heroku performance, or does it just mean I need to add more dynos?

Any advice appreciated.

like image 266
Ronze Avatar asked May 24 '13 13:05

Ronze


People also ask

How do I check my Heroku Dyno hours?

You can view the amount of free dyno hours remaining by using the CLI. You can do this by running heroku ps on one of your free apps. Alternatively, you can also view this on Dashboard's billing page, which is refreshed daily to display the updated amount.

What is Heroku Runtime?

The Heroku Runtime is the part of the Heroku platform responsible for running and managing your application. The Heroku Runtime is responsible for provisioning and orchestrating containers (dynos), managing and monitoring their lifecycle, providing proper network configuration, HTTP routing, log aggregation, and more.

How long is Heroku free?

Starting today, Heroku accounts have an account-based pool of free dyno hours for use on free apps. This replaces the 18 hours a day limit on free apps, allowing a free app to run 24/7 if needed. New accounts receive 550 free dyno hours and you can verify your identity with a credit card for an additional 450 hours.


2 Answers

Basically, break the problem down into its parts and test each part. Simply throwing a bunch of requests at a cluster of unicorns isn't necessarily a good way to measure throughput. You have to consider many variables (side note: checkout "Programmers Need To Learn Statistics Or I Will Kill Them All" by Zed Shaw)

Also, you're leaving out critical information from your question for solving the mystery.

  • How many requests is each unicorn handling per second?
  • How long is the total test and are you allowing time for whatever cache you have to warm up?
  • How many total requests were handled by the collection?
  • I see in the chart that queuing time drops significantly from the initial spike at the left hand side of the chart - any idea why? Is this startup time? Is this cache warming? Is it a flood of requests coming disproportionally at the beginning of the test?

You're the only person who can answer these questions.

Queuing time, if I understand Heroku's setup correctly, is essentially the time new requests sit waiting for an available unicorn (or to be more accurate with unicorn, how long requests sit before they are grabbed by unicorn). If you're load testing and feeding the system more than it can handle then, while your app itself my serve requests that it's ready to handle very quickly, there will still be a backlog of requests waiting for an available unicorn to process it.

Depending on your original setup, try the following variables in your test:

  • Same number of total requests, but run it longer to see if caches warm up more and speed up response times (i.e. unicorns handle more requests per second)
  • Adjust the number of requests per second to the total collection of unicorns available, both up and down, and observe at what thresholds the queuing times get better and worse
  • Simplify the test. First, just test a single unicorn process and figure out how long it takes to warm up, how many requests per second it can handle, and at what point queuing times start to increase due to backlogs. Then, add unicorn processes and repeat the tests, trying to to find out if, with 3 unicorns, you get 3x performance, or if there's some % overhead in adding more unicorns (e.g. the overhead of load balancing the incoming requests), and whether that overhead is negligible or not, etc.
  • Make sure the requests are all very similar. If you have some requests that are just returning a front page with 100% cached and non-dynamic content your processing times will be much shorter than requests that need to generate a variable amount of dynamic content, which is going to throw off your test results considerably.

Also, find out if the test results chart you're showing above is an average, or a 95th percentile with standard deviations, or some other measurement.

Only after you've broken the problem down into its component parts will you know with any predictability whether or not adding more unicorns will help. Looking at this basic chart and asking, "Should I just add more unicorns?" is like having a slow computer and asking, "Should I just add more RAM to my machine?". While it may help you're skipping the steps of actually understanding why something is slow, and adding more of something, while it may help, won't give you any deeper understanding of why it's slower. Because of this (and especially on heroku), you might wind up overpaying for more dynos when you don't need them, if only you could get to the root of what is causing the longer than expected queuing times you'll be in much better shape.

This approach, of course, isn't unique to heroku. Trying experiments, tweaking variables, and recording the outcome measurements will allow you to pick apart what's going on inside those performance numbers. Understanding the "why" will enable you to take specific, educated steps that should have mostly predictable effects on overall performance.

After all of that you may find that yes, the best way to improve the performance in your specific case is to add more unicorns, but at least you'll know why and when to do so, as well as a really solid guess as to how many to add.

like image 98
jefflunt Avatar answered Oct 16 '22 15:10

jefflunt


I essentially wrote another question, and then sat back, and realized I just edited this exact question a week before, and knew the answer to both.

What jefflunt said is basically 100% true, but, because I'm here, I'm here to spell it out.

There's 2 solutions:

  1. Add more Unicorn Workers.
  2. Reduce the total transaction time of requests.

They basically boil down to the same exact concept, but:

  • If you have 15k transactions per minute, you'd have 250 transactions per second.
  • If you're average transaction time is 100ms, each worker can execute 10 transactions per second (where 1000ms/(100ms/transactions)).
  • If you have 8 dynos with 3 workers, you'd have 24 workers.
  • 24 workers at 10 transactions per second means your current setup can produce around 240 transactions per second.

Granted, this is just the roughest of frameworks on how to gauge the problem, especially because traffic is always weighted somehow, and taking an average (over the median), is usually a better gauge because you're taking more into consideration the 95% requests, but you'll be close to the right number to understanding what kind of capacity you need.

like image 3
FullOnFlatWhite Avatar answered Oct 16 '22 16:10

FullOnFlatWhite