How to profile inconsistent H12 timeouts on Heroku

Question

My users are seeing occasional request timeouts on Heroku. Unfortunately I can not consistently reproduce them which makes them really hard to debug. There's plenty of opportunity to improve performance - e.g. by reducing the huge number of database queries per request and by adding more caching - but without profiling that's a shot in the dark.

According to our New Relic analytics, many requests take between 1 and 5 seconds on the server. I know that's too slow, but it nowhere near the 30 seconds needed for the timeout.

The error tab on New Relic shows me several different database queries where the timeout occurs, but these aren't particularly slow queries and it can be different queries for each crash. Also for the same URL it sometimes does and sometimes does not show a database query.

How do I find out what's going on in these particular cases? E.g. how do I see how much time it was spending in the database when the timeout occurred, as opposed to the time it spends in the database when there's no error?

One hypothesis I have is that the database gets locked in some cases; perhaps a combination of reading and writing.

grumbler · Accepted Answer

You may have already seen it, but Heroku has a doc with some good background about request timeouts.

If your requests are taking a long time, and the processes servicing them are not being killed before the requests complete, then they should be generating transaction traces that will provide details about individual transactions that took too long.

If you're using Unicorn, it's possible that this is not happening because the requests are taking long enough that they're hitting up against Unicorn's timeout (after which the workers servicing those requests will be forcibly killed, not giving the New Relic agent enough time to report back in).

I'd recommend a two-step approach:

Configure the rack-timeout middleware to have a timeout below Heroku's 30s timeout. If this works, it will terminate requests taking longer than the timeout by raising a Timeout::Error, and such requests should generate transaction traces in New Relic.
If that yields nothing (which it might, because rack-timeout relies on Ruby's stdlib Timeout class, which has some limitations), you can try bumping the Unicorn request handling timeout up from its default of 60s (assuming you're using Unicorn). Be aware that long-running requests will tie up a Unicorn worker for a longer period in this case, which may further slow down your site, so use this as a last resort.

grokpot · Answer

Two years late here. I have minimal experience with Ruby, but for Django the issue with Gunicorn is that it does not properly handle slow clients on Heroku because requests are not pre-buffered, meaning a server connection could be left waiting (blocking). This might be a helpful article to you, although it applies primarily to Gunicorn and Python.

How to profile inconsistent H12 timeouts on Heroku

Tags:

performance

ruby-on-rails

heroku

newrelic

Sjors Provoost

2 Answers

grumbler

grokpot

Recent Activity

Donate For Us

How to profile inconsistent H12 timeouts on Heroku

Tags:

performance

ruby-on-rails

heroku

newrelic

Sjors Provoost

2 Answers

grumbler

grokpot

Related questions

Recent Activity

Donate For Us