In my company, our application runs on NodeJS over multiple EC2 instances and one RDS database.
Our application needed some upgrades as some dependencies were already quite old, and one of the upgrades we did that called our attention was updating our database libraries: mysql (from 2.16.0 to 2.17.0), knex (from 0.12.2 to 0.19.1) and bookshelf (0.10.2 to 0.15.1).
After checking the changelog, no code changes were required, so we quickly managed to upload it to our staging server.
Suddenly our application went too slow. All data took several seconds to load, and our main user's dashboard, which loads in a few miliseconds on the same server, took about 30 seconds. After a few minutes, the whole application was completely unresponsive.
In order to check if the problem was only involved with the dependency upgrade, we've managed to downgrade these to the working versions, and the application went back to the normal speed. Upgraded again, slow again.
We've started analyzing through New Relic if there was something wrong on the RDS side. Nothing at all. No peaks, no high usage of CPU, no slow queries or anything else. Then we came to check the connection pooling and discovered that the knex version that works for us uses "generic-pool", whereas the new version uses "tarn".
So we started debugging the pool and found out that it gets filled of a specified query, freezes completely for some time then starts throwing the "TimeoutError: Knex: Timeout acquiring a connection. The pool is probably full" error.
But what's most interesting about the query that fills all the pools and freezes then is that it should not be generated at all (and is not generated when using the outdated version that does not face this problem).
In our application, we only do SELECT requests against the contacts table in two ocasions:
First, obviously, is when the users want to list their contacts:
let contacts = await Contacts.forge({ 'list_owner': udata.id }).fetchAll()
And secondly, when checking for contact match to tell whether some information should be visible to an specified user, depending on the information owner's privacy settings:
let checkContact = await Contacts.where({
list_owner: target_user,
contact: udata.id
}).fetch()
After several grepping, I can guarantee that there is no other place in our codebase that SELECTS from the contacts table. In our debugging, we found no undefined values, and our investigation showed that the query runs when the former code runs. But as you can see in the screenshot, the query knex runs has no conditions:
select `contacts`.* from `contacts`
We believe this is the reason why it fills the pools (as requesting every user's contact is quite a job), but at the same time, we cannot see why knex is running such query, as:
What could be causing such problem?
For some that may fall here!
If it make no sense and you upgraded nodejs to v14 recently! It can be the cause!
I had the problem and pulled my hair for quiet a time last time! And i was using nvm so somehow after trying to track what i made different! Cause it was working before! It came to my mind and tested it with v13! And it worked again!
So be aware! it may be the thing and that may save you tremendous time and stress!
Nodejs v14 made breaking changes! And pg
module get affected! pg started to have the process exit at connect() call
!
If you are using postgres
! With nodejs v14 and above ! Make sure to use the driver module pg
at version >=8.0.3
! And better upgrade to the latest
npm install pg@latest --save
If you are not using postgres
! Try to update your DB driver! It may be the same! Also try with nodejs V13
. To confirm it's the same problem!
If like me you like to know the details and what did happen !?
With node V14! Some breaking changes happened on the api! Also many things were changed! Including Openssl version!
For postgres! And pg
module! The problem was as described in this comment per this thread:
The initial readyState (a private/undocumented API that
pg uses) of net.Socket seems to have changed from 'closed' to 'open' in Node 14.
It’s hard to fix with perfect backwards compatibility, but I think I have a patch that’s close enough.
And as per this PR!
You can see the changes in this diffing
In short as mentioned! The api for onReady changed for a net.Socket
!
And the implemented solution was to not use onReady at all!
And as per this
Connection now always calls connect on its stream when connect is called on it.
In the older version the connect was called only if the socket is on closed
state! readyState
usage is eliminated!
Check this line
You can understand!
Depending on the implementation! Many things may or not be affected by those core changes!
And because i wanted to see where the change happen! Here you go
https://github.com/nodejs/node/pull/32272
One can check the log of changes too:
https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V14.md
Also to mention the breaking changes! Made pg
make the process exit at connect() call
. And that's what made it exit! And logging was to be seen!
In more detail for this! Here how it happened! Sequelize have the postgres dialect implementation! Which use pg! And pg client! create a connection! The connection have a connect
event! When it connect it emit it! And because node v14 change the behavior of a stream to starting with open! The stream connection is skipped! Because of the readyState
check (expected as close but it became open instead!)! And the stream is taken as connected (else block)! Where it is not! And the connect
event is emitted directly! When that happen! The client either will call requestSsl()
or startup()
method of the connection object! And both will call this._stream.write
. because the stream is not connected! An error happen! This error is not catch! Then the promise in sequelize driver! Will stay unresolved! And then the event loop get empty! Nodejs by default behavior just exit!
You can see the step through lines of code:
connect()
call and emit connect
! Thinking the stream is connected because of V14 changeconnect
event catched and callback run! requestSsl()
or startup()
will be runstream.write
will be called (requestSsl(), startup())https://github.com/nodejs/node/issues/22088
Node exits without error and doesn't await promise (Event callback)
what happens when a Promise never resolves?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With