Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"could not receive data from server: Connection timed out" or "connection not open" errors when connecting to PostGres DB on a separate AWS instance

I am using Ruby 1.9.3 in my app server that is running on an AWS EC2 instance. I have my Postgres DB running on a separate EC2 instance but both instances are in the same security group. When m Ruby code connects to the DB, it uses the Sequel ORM gem ( http://sequel.rubyforge.org/ ) .

Now, I have configured that Postgres 9.1.4 DB to be able to accept connections properly from the app server instance.

However, every now and then, I noticed on the app server's logs that it will have trouble connecting to the Postgres DB instance and I would see error messages like these :

PG::Error: could not receive data from server: Connection timed out

or

PG::Error: connection not open

So I went to the Postgres DB instance and looked in /var/log/postgresql/postgresql-9.1-main.log and I see a bunch of messages like these:

2012-11-07 08:15:17 UTC LOG:  could not receive data from client: Connection timed out
2012-11-07 08:15:17 UTC LOG:  unexpected EOF on client connection

I did searching around the web including stack overflow, and have made sure that my PostgreSQL does not have SSL enabled ( I have "ssl=off" inside my postgresql.conf file )

At this point, I am not exactly sure what exactly is the issue in the Postgres configuration. I do not what to mess with the maximum # of connections or maximum timeout values on my production server without a good proven reason.

The app server can connect to the DB most of the time, and this problem only shows up intermittently.

On the Ruby-side, this is the error trace for "connection not open" when making the Postgres call :

PG::Error: connection not open
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:145:in `async_exec'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:145:in `block in execute_query'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/logging.rb:33:in `log_yield'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:145:in `execute_query'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:132:in `block in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:111:in `check_disconnect_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:132:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:372:in `_execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block (2 levels) in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:379:in `check_database_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `block in synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/connection_pool/threaded.rb:105:in `hold'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/dataset/actions.rb:744:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:483:in `fetch_rows'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:785:in `primary_key_lookup'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:124:in `[]'

Similarly, this is the trace for "could not receive data from server" :

    PG::Error: could not receive data from server: Connection timed out
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:124:in `block'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:124:in `ensure in check_disconnect_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:124:in `check_disconnect_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:132:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:372:in `_execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block (2 levels) in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:379:in `check_database_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `block in synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/connection_pool/threaded.rb:105:in `hold'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/dataset/actions.rb:744:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:483:in `fetch_rows'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:785:in `primary_key_lookup'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:124:in `[]'

I observed that if I have both the App server and the Postgres DB running on the same instance, then there are no connectivity issues, at least not yet. Maybe the Postgres is less lenient towards non-local DB connections?

Please let me know what I might have missed, I appreciate it!

IS

like image 738
user1805458 Avatar asked Nov 07 '12 08:11

user1805458


1 Answers

The usual explanation for this would be a connectivity problem.

Alternately, if it's not connectivity, it could be a protocol synchronisation issue. It looks like the two ends could be trying to read from the socket, with neither trying to write. So maybe the client is expecting the server to send a response, while the server is expecting the client to send data.

This will be very hard to debug if it's intermittent and occasional, as you can't really just tcpdump it and analyse it.

I'd add more logging on the server side - log_statement = 'all', and a log_line_prefix that shows the client IP, backend start time & backend pid. That way you can start trying to match these failures back to the session activity that happened before the failure, work out if it's particular clients, particular jobs, or really just random.

Is this Sequel ORM gem using libpq at the bottom layer, or its own PostgreSQL protocol implementation? If the latter, that's probably going to turn out to be the culprit.

Update: It looks like it can use the pg gem (libpq-based), the postgres gem, or possibly postgres-pr (whatever that is). It'll prefer pg if it's installed.

Since it seems you're already using the pg gem, you're probably going to need to do some diagnostic work to track down where the problem arises - particular queries, particular clients, etc - and try to find a way to reproduce the issue. PostgreSQL's csvlog might be useful so you can load and analyse the logs more easily.

like image 114
Craig Ringer Avatar answered Nov 03 '22 00:11

Craig Ringer