Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Having "Timeout acquiring a connection" after upgrading Knex

In my company, our application runs on NodeJS over multiple EC2 instances and one RDS database.

Our application needed some upgrades as some dependencies were already quite old, and one of the upgrades we did that called our attention was updating our database libraries: mysql (from 2.16.0 to 2.17.0), knex (from 0.12.2 to 0.19.1) and bookshelf (0.10.2 to 0.15.1).

After checking the changelog, no code changes were required, so we quickly managed to upload it to our staging server.

Suddenly our application went too slow. All data took several seconds to load, and our main user's dashboard, which loads in a few miliseconds on the same server, took about 30 seconds. After a few minutes, the whole application was completely unresponsive.

In order to check if the problem was only involved with the dependency upgrade, we've managed to downgrade these to the working versions, and the application went back to the normal speed. Upgraded again, slow again.

We've started analyzing through New Relic if there was something wrong on the RDS side. Nothing at all. No peaks, no high usage of CPU, no slow queries or anything else. Then we came to check the connection pooling and discovered that the knex version that works for us uses "generic-pool", whereas the new version uses "tarn".

So we started debugging the pool and found out that it gets filled of a specified query, freezes completely for some time then starts throwing the "TimeoutError: Knex: Timeout acquiring a connection. The pool is probably full" error.

But what's most interesting about the query that fills all the pools and freezes then is that it should not be generated at all (and is not generated when using the outdated version that does not face this problem).

enter image description here

In our application, we only do SELECT requests against the contacts table in two ocasions:

First, obviously, is when the users want to list their contacts:

let contacts = await Contacts.forge({ 'list_owner': udata.id }).fetchAll()

And secondly, when checking for contact match to tell whether some information should be visible to an specified user, depending on the information owner's privacy settings:

let checkContact = await Contacts.where({
        list_owner: target_user,
        contact: udata.id
}).fetch()

After several grepping, I can guarantee that there is no other place in our codebase that SELECTS from the contacts table. In our debugging, we found no undefined values, and our investigation showed that the query runs when the former code runs. But as you can see in the screenshot, the query knex runs has no conditions:

select `contacts`.* from `contacts`

We believe this is the reason why it fills the pools (as requesting every user's contact is quite a job), but at the same time, we cannot see why knex is running such query, as:

  • No code changes has been done after knex upgrade
  • The problem does not exist when using the old knex version (our production server is up and running with the outdated knex version)
  • We do a LOT of caching using Redis (but whatever, the DB is not overloaded and the old Knex version works)
  • If the issue was really that the conditions were missing, we could have discovered it before, as every user would see the same contacts list.

What could be causing such problem?

like image 992
Jefrey Sobreira Santos Avatar asked Jul 30 '19 01:07

Jefrey Sobreira Santos


1 Answers

For some that may fall here!

If it make no sense and you upgraded nodejs to v14 recently! It can be the cause!

I had the problem and pulled my hair for quiet a time last time! And i was using nvm so somehow after trying to track what i made different! Cause it was working before! It came to my mind and tested it with v13! And it worked again!

So be aware! it may be the thing and that may save you tremendous time and stress!

Problem

Nodejs v14 made breaking changes! And pg module get affected! pg started to have the process exit at connect() call!

fix for node v14+

If you are using postgres! With nodejs v14 and above ! Make sure to use the driver module pg at version >=8.0.3! And better upgrade to the latest

npm install pg@latest --save

If you are not using postgres! Try to update your DB driver! It may be the same! Also try with nodejs V13. To confirm it's the same problem!

What did happen in v14

If like me you like to know the details and what did happen !?

With node V14! Some breaking changes happened on the api! Also many things were changed! Including Openssl version!

For postgres! And pg module! The problem was as described in this comment per this thread:

The initial readyState (a private/undocumented API that

pg uses) of net.Socket seems to have changed from 'closed' to 'open' in Node 14.

It’s hard to fix with perfect backwards compatibility, but I think I have a patch that’s close enough.

And as per this PR!

You can see the changes in this diffing

In short as mentioned! The api for onReady changed for a net.Socket ! And the implemented solution was to not use onReady at all!

And as per this

Connection now always calls connect on its stream when connect is called on it.

In the older version the connect was called only if the socket is on closed state! readyState usage is eliminated!

Check this line

You can understand!

Depending on the implementation! Many things may or not be affected by those core changes!

Nodejs v14 relevant change

And because i wanted to see where the change happen! Here you go

https://github.com/nodejs/node/pull/32272

One can check the log of changes too:

https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V14.md

Detailed Why + exit and no logging error

Also to mention the breaking changes! Made pg make the process exit at connect() call. And that's what made it exit! And logging was to be seen! In more detail for this! Here how it happened! Sequelize have the postgres dialect implementation! Which use pg! And pg client! create a connection! The connection have a connect event! When it connect it emit it! And because node v14 change the behavior of a stream to starting with open! The stream connection is skipped! Because of the readyState check (expected as close but it became open instead!)! And the stream is taken as connected (else block)! Where it is not! And the connect event is emitted directly! When that happen! The client either will call requestSsl() or startup() method of the connection object! And both will call this._stream.write. because the stream is not connected! An error happen! This error is not catch! Then the promise in sequelize driver! Will stay unresolved! And then the event loop get empty! Nodejs by default behavior just exit!

You can see the step through lines of code:

  • Sequelize pg adapter will call pg client to create a connection and the promise
  • pg client call connect on a connection object
  • pg connection connect() call and emit connect! Thinking the stream is connected because of V14 change
  • pg client connect event catched and callback run! requestSsl() or startup() will be run
  • One of the method get run and stream.write will be called (requestSsl(), startup())
  • Stream Error (not catched)
  • Promise in sequelize postgres adapter! Still unresolved!
  • event loop empty => Nodejs => Exit

Why nodejs exit (unresolved promises)

https://github.com/nodejs/node/issues/22088

Node exits without error and doesn't await promise (Event callback)

what happens when a Promise never resolves?

like image 105
Mohamed Allal Avatar answered Nov 06 '22 06:11

Mohamed Allal