Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Connection Pool Strategy: Good, Bad or Ugly? [closed]

I'm in charge of developing and maintaining a group of Web Applications that are centered around similar data. The architecture I decided on at the time was that each application would have their own database and web-root application. Each application maintains a connection pool to its own database and a central database for shared data (logins, etc.)

A co-worker has been positing that this strategy will not scale because having so many different connection pools will not be scalable and that we should refactor the database so that all of the different applications use a single central database and that any modifications that may be unique to a system will need to be reflected from that one database and then use a single pool powered by Tomcat. He has posited that there is a lot of "meta data" that goes back and forth across the network to maintain a connection pool.

My understanding is that with proper tuning to use only as many connections as necessary across the different pools (low volume apps getting less connections, high volume apps getting more, etc.) that the number of pools doesn't matter compared to the number of connections or more formally that the difference in overhead required to maintain 3 pools of 10 connections is negligible compared to 1 pool of 30 connections.

The reasoning behind initially breaking the systems into a one-app-one-database design was that there are likely going to be differences between the apps and that each system could make modifications on the schema as needed. Similarly, it eliminated the possibility of system data bleeding through to other apps.

Unfortunately there is not strong leadership in the company to make a hard decision. Although my co-worker is backing up his worries only with vagueness, I want to make sure I understand the ramifications of multiple small databases/connections versus one large database/connection pool.

like image 871
Drew Avatar asked Sep 09 '25 21:09

Drew


2 Answers

Your original design is based on sound principles. If it helps your case, this strategy is known as horizontal partitioning or sharding. It provides:

1) Greater scalability - because each shard can live on separate hardware if need be.

2) Greater availability - because the failure of a single shard doesn't impact the other shards

3) Greater performance - because the tables being searched have fewer rows and therefore smaller indexes which yields faster searches.

Your colleague's suggestion moves you to a single point of failure setup.

As for your question about 3 connection pools of size 10 vs 1 connection pool of size 30, the best way to settle that debate is with a benchmark. Configure your app each way, then do some stress testing with ab (Apache Benchmark) and see which way performs better. I suspect there won't be a significant difference but do the benchmark to prove it.

like image 178
Asaph Avatar answered Sep 12 '25 10:09

Asaph


If you have a single database, and two connection pools, with 5 connections each, you have 10 connections to the database. If you have 5 connection pools with 2 connections each, you have 10 connections to the database. In the end, you have 10 connections to the database. The database has no idea that your pool exists, no awareness.

Any meta data exchanged between the pool and the DB is going to happen on each connection. When the connection is started, when the connection is torn down, etc. So, if you have 10 connections, this traffic will happen 10 times (at a minimum, assuming they all stay healthy for the life of the pool). This will happen whether you have 1 pool or 10 pools.

As for "1 DB per app", if you're not talking to an separate instance of the database for each DB, then it basically doesn't matter.

If you have a DB server hosting 5 databases, and you have connections to each database (say, 2 connection per), this will consume more overhead and memory than the same DB hosting a single database. But that overhead is marginal at best, and utterly insignificant on modern machines with GB sized data buffers. Beyond a certain point, all the database cares about is mapping and copying pages of data from disk to RAM and back again.

If you had a large redundant table in duplicated across of the DBs, then that could be potentially wasteful.

Finally, when I use the word "database", I mean the logical entity the server uses to coalesce tables. For example, Oracle really likes to have one "database" per server, broken up in to "schemas". Postgres has several DBs, each of which can have schemas. But in any case, all of the modern servers have logical boundaries of data that they can use. I'm just using the word "database" here.

So, as long as you're hitting a single instance of the DB server for all of your apps, the connection pools et al don't really matter in the big picture as the server will share all of the memory and resources across the clients as necessary.

like image 32
Will Hartung Avatar answered Sep 12 '25 11:09

Will Hartung