I have two servers connecting to a PostgresSQL 9.6
db hosted on Azure. The servers are doing one thing - hitting the Postgres db with a SELECT 1
query every 5 seconds.
Typical time to connect to db and get data:
25 MS
500 MS
My problem is that my .NET Core app is 20x slower than Node in getting data. I believe .NET Core is not pooling connections for some reason. This slowness occurs with both running the app locally and while running it on Azure App Services - no difference. I want to solve the .NET --> Postgres slowness.
Please only skim the relevant details and don't read the whole thing past this point - I believe only the .NET Core
code is relevant.
A PsPing
to the db from my machine (on which both the Node
and the .NET Core
apps are running:
Connecting to foobarPostGres:5432 (warmup): from someIp: 19.98ms
Connecting to foobarPostGres:5432: from someIp: 1.65ms
Connecting to foobarPostGres:5432 from someIp: 1.18ms
Connecting to foobarPostGres:5432: from someIp: 1.23ms
Connecting to foobarPostGres:5432: from someIp: 1.06ms
For sake of completeness, a sample of NODE
times look like this (note that the first time it establishes a connection, it is also "slow"):
Attempting to establish a connection...
Elapsed ms: 644.1334999799728
RESP: { '?column?': 1 }
Elapsed ms: 22.76109904050827
RESP: { '?column?': 1 }
Elapsed ms: 21.984400033950806
RESP: { '?column?': 1 }
Elapsed ms: 26.043799996376038
RESP: { '?column?': 1 }
Elapsed ms: 22.538798987865448
RESP: { '?column?': 1 }
Connection times for .NET Core
look like this:
5:13:32 PM: SLOW QUERY, CONN TIME: 4153, QUERY TIME: 18
5:13:53 PM: SLOW QUERY, CONN TIME: 707, QUERY TIME: 17
5:14:14 PM: SLOW QUERY, CONN TIME: 589, QUERY TIME: 16
5:14:35 PM: SLOW QUERY, CONN TIME: 663, QUERY TIME: 18
5:14:56 PM: SLOW QUERY, CONN TIME: 705, QUERY TIME: 16
Note the super-slow initial connection time and a long time to establish a connection on subsequent requests.
Anyway, because I am desperate, I am going to dump all my code now, with explanations. The connection string looks like this:
public static string CONNECTION_STRING {
get {
return $"Server={HOST}; User Id={USER}; Database={DB_NAME}; Port={PORT}; Password={PWD}; SSLMode=Prefer";
}
}
It is my understanding that I should get connection pooling out of the box if I use this connection string. Note that I have tried turning of SSL
on both the db and taking that line out - it did not help.
My health check controller looks like this:
// GET api/health/getdbhealthselectone
[HttpGet]
[Route("getdbhealthselectone")]
public async Task<IActionResult> GetDbHealthSelectOne()
{
int testData = await _healthCheckRepo.RunHealthCheckSelectOne();
return Ok(testData);
}
My health check repo method looks like this:
public async Task<int> RunHealthCheckSelectOne()
{
await using var conn = new NpgsqlConnection(AzureDbConnectionInfo.CONNECTION_STRING);
var connTimer = System.Diagnostics.Stopwatch.StartNew(); // TODO: Remove this testing line
await conn.OpenAsync();
connTimer.Stop(); // TODO: Remove this testing line
var msToConnect = connTimer.ElapsedMilliseconds; // TODO: Remove this testing line
int testData = 999;
var jobsQueryTimer = System.Diagnostics.Stopwatch.StartNew(); // TODO: Remove this testing line0
await using (var cmd = new NpgsqlCommand("SELECT 1", conn))
await using (var reader = await cmd.ExecuteReaderAsync())
while (await reader.ReadAsync()) {
testData = reader.GetInt32(0);
};
jobsQueryTimer.Stop(); // TODO: Remove this testing line
var msToQuery = jobsQueryTimer.ElapsedMilliseconds; // TODO: Remove this testing line
LogQueryIfSlow(msToConnect, msToQuery, _logger); // TODO: Remove this testing line
return testData;
}
Note the timers here - await conn.OpenAsync();
is what takes the bulk of the time by far, the queries themselves are fast. Also, for sake of saving time - I have run this code WITHOUT async
before, no difference.
Finally, in case there are dependency injection concerns, the repository is in a class library, the API project references it, and:
services.AddSingleton<IHealthCheckRepository, HealthCheckRepository>();
This is how it sees it.
I believe this is all the relevant information - I have been on the phone with Azure support and they found no issues with the db config. The .NET Core app is super light, so it's not like it's overloaded and it's in testing, so no traffic besides my tests.
Extra: For the sake of completeness, here is my WHOLE node app which hits the db and gets the performance posted (conn data taken out).
const { Pool, Client } = require('pg');
const { performance } = require('perf_hooks');
const pool = new Pool({
user: 'SECRET',
host: 'SECRET',
database: 'SECRET',
password: 'SECRET',
port: 5432,
})
function runQuery(pool) {
var t0 = performance.now();
pool.query('SELECT 1', (err, res) => {
if (err) {
console.log('ERROR: ', err.stack)
} else {
console.log('RESP: ', res.rows[0])
}
var t1 = performance.now();
console.log('Elapsed ms: ', t1-t0);
//pool.end()
});
}
setInterval(() => {runQuery(pool)}, 5000);
EDIT: For posterity, here are the times in .NET Core after fixing the connection pool timeout - it's faster than node, except on that initial connection, which seems to take a while, but I haven't checked some defaults:
CONN: 1710 QUERY: 18
CONN: 0 QUERY: 16
CONN: 0 QUERY: 16
CONN: 0 QUERY: 17
CONN: 0 QUERY: 16
CONN: 0 QUERY: 23
CONN: 0 QUERY: 16
CONN: 0 QUERY: 16
CONN: 0 QUERY: 23
CONN: 0 QUERY: 16
CONN: 0 QUERY: 16
PostgreSQL Connection Limits 15 connections are reserved for the superuser to maintain the state and integrity of your database, and 100 connections are available for you and your applications. If the number of connections to the database exceeds the 100-connection limit, new connections fail and return an error.
You need to set a min pool size. Doing so ensures that this amount of connections remains open to the DB regardless of the pool usage.
By default (at least for NPGSQL), the min size is 0, so if the connection is not used for a while, it will be closed.
In your test, you do one call every 5 seconds, which is not much, and the pool might decide to close the unused connection. According to the doc it should keep it open for 300 seconds though, not just 15
The first call is almost exactly 5 seconds longer than the rest. This looks like an IP address resolution issue to me. It first picks a method which is defective for the given server, then after 5 seconds it times-out and picks a different method, which works. Then it is cached for a while, and so continues to work well until the cached entry expires.
To see if this is the problem, hardcode the IP address for the database host into your "hosts" file, and see if that fixes the problem. If so, then the root cause becomes a question for your network engineers.
On the database side, you can turn on slow query logging, either log_min_duration_statement
or better yet auto_explain.log_min_duration
. But if my theory is correct, this won't show anything. The database doesn't know how long you spent trying to look up its IP address.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With