Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL DB performance issues with thousands of connections and distributed transactions

We are trying to evaluate PostgreSQL DB as an alternative to Oracle database in our application. We use PostgreSQL 9.5 which is installed on a Linux machine with 128 GBs of memory, 32 CPU cores and SSD storage. Connection pools and distributed transactions are managed by JBoss 7 application server, SQL queries are generated/executed by Hibernate 4. Most tables have tens millions rows, one of them has hundreds millions rows. In total around 3,000 database connections (they are pooled by the application server) are active and used concurrently. We modified some queries, created indexes for slow ones, tuned DB and OS settings based on documentation, etc. However, throughput is few times slower and eventually DB response time increases 10-20 times.

I've done some googling and I couldn't find information about anyone else (ab)using PostgreSQL DB the same way:

  • using thousands of active database connections
  • using this high number of distributed transactions (PREPARED TRANSACTIONS)
  • storing billions of rows in one table

Oracle doesn't have any problem handling even higher load. I would appreciate sharing your experience, suggestions, links, etc.

Thanks

like image 452
Moose on the Loose Avatar asked Sep 22 '16 00:09

Moose on the Loose


People also ask

How many connections can PostgreSQL handle?

As we know, Postgres allows the database to be used by multiple users, but there is a limit set by default that is 115 users in which 15 connections are separated for super users, and the rest 100 are for normal users.

Can Postgres handle multiple connections?

With 1 or 2 application servers spawning 5-10 connections, even the tiniest of Postgres servers can handle such.

Does Postgres support distributed transactions?

Postgres does not support distributed transactions, so all commandsof a transaction are executed by one backend. We don't currently handle nested transactions, either. Atomic: results of a transaction are seen entirely or not at all within other transactions.

Why is PostgreSQL so slow?

PostgreSQL attempts to do a lot of its work in memory, and spread out writing to disk to minimize bottlenecks, but on an overloaded system with heavy writing, it's easily possible to see heavy reads and writes cause the whole system to slow as it catches up on the demands.


1 Answers

The solution was upgrading the Linux kernel and decreasing the number of DB connections in our Java connection pools from 3000 to 300. After this change we could handle the same traffic as we could with Oracle DB.

By accident I have discovered a precious piece of information which lead to problem resolution in comments section for the post Did I Say 32 Cores? How about 64? written by Robert Haas (VP, Chief Architect, Database Server @ EnterpriseDB, PostgreSQL Major Contributor and Committer):

No, I'm saying that to get good performance on a 64-core server, you're going to need PostgreSQL >= 9.2 and Linux >= 3.2. Most of the changes are actually on the PostgreSQL side, but the lseek scaling stuff in the Linux kernel was important, too.

like image 110
Moose on the Loose Avatar answered Oct 07 '22 01:10

Moose on the Loose