Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random timeout exceptions in Rails app on Heroku

I host a Rails 3.2 app on Heroku and get 2-3 timeouts within the Rails application each day. These are not H12 request timeouts, but rather timeouts that occur somewhere within the Rails stack. Thus, they actually generate exceptions on the site and appear in my Airbrake logs.

It seems to be completely random where the timeout occurs; sometimes it's within a gem like Formtastic, or within a HAML view, or within the ActiveRecord code. You can see examples of some of the backtraces here: https://gist.github.com/dpmccabe/5238273

This site doesn't get much traffic and runs well on two dynos (though they automatically scale up thanks to the Adept Scale add-on). The HTTP_X_HEROKU_QUEUE_WAIT_TIME header is usually low or zero, so I don't think it's a routing issue. I even tried switching from Thin to Unicorn with no effect (my unicorn.rb is shown in the above gist).

The fact that these timeout exceptions seem to occur randomly throughout the application doesn't give me much to go on. I do have New Relic, but I'm not sure how to go about debugging this. Any ideas?

like image 472
Devin Avatar asked Mar 25 '13 16:03

Devin


1 Answers

I also have been running into the same issue. While I haven't solved it yet, I thought I would chime in with what I've looked at so far. I'm using the rack-timeout gem (based on your backtraces, it looks like you are as well) and have the timeout set at 15 seconds. Looking at new relic, my average app server response time for any request is well under 200ms. Yet, like you, I get 2-3 errors a day that look like this:

undefined method `result' for #<Timeout::Error: execution expired>

The errors occur on a wide range of actions, with no action seeming to be particularly likely to generate one. The error even occurs on simple CRUD DELETE actions. I'm running a rails 3.2 app on Heroku's Cedar stack. I run two web dynos, each with 3 unicorn workers. They each consistently stay below the 512mb limit.

The only clue I've found so far is that I often see something like the following near my timeouts in my logs:

[AMBER] LOG: process 21289 acquired ShareLock on transaction 105259 after 32366.132 ms

Do you see something similar? It might be possible that a DB action locking a record is causing the timeout, I'm not quite sure.

like image 158
middkidd Avatar answered Oct 11 '22 23:10

middkidd