Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of postgresql with large numbers of tables (EG: 1 million tables)?

What is the largest number of tables that can be within a single pgsql database while still retaining good performance, given that pgsql stores 1 file per table on the filesystem and searches the pg_catalog for every query to do query planning?

EG: Can pgsql deal with 1 million tables within a single database? Assume that the filesystem used is ext4 and each table contained very little data, so the overage disk storage size isn't an issue. The issue really comes from (1) impact of having 1 million files on the filesystem and (2) impact of having 1 million entries in pg_catalog.

From this thread (2005), http://postgresql.1045698.n5.nabble.com/GENERAL-Maximum-number-of-tables-per-database-and-slowness-td1853836.html - it is said below (but I do not how much of this is still applicable these days):

Benjamin Arai wrote:

What is the current maximum number of tables per database? Also, does having more tables slow down performance in any way?

For most cases, the answer is no. However, once you get near 6 figure table counts, pg_catalog ends up being pretty massive. The problem is that the query planner must check pg_catalog for every query to see what indexes are available, what the statistics & value distributions are, etc. in order to build the optimal plan. At some point, a really large pg_catalog can begin to bog down your system.

...

William Yu <[hidden email]> writes:

Benjamin Arai wrote:

What is the current maximum number of tables per database? Also, does having more tables slow down performance in any way?

For most cases, the answer is no. However, once you get near 6 figure table counts, pg_catalog ends up being pretty massive.

You also have to think about the performance implications of having tens of thousands of files in your database directory. While some newer filesystems aren't fazed by that particularly, a lot of 'em bog down on lookups when there are more than a few thousand entries in a directory.

like image 880
archmeta Avatar asked Oct 23 '11 13:10

archmeta


2 Answers

You don't have to keep a million files in a single directory. You can use CREATE TABLESPACE to arrange space in a different directory or on a different disk. I don't know anything about pg_catalog internals, but I can imagine how it might narrow the search by tablespace first, which could significantly reduce search time.

But that's different from the possible problems of having a million files in the filesystem in general, or with the actual (not imagined) issues with pg_catalog.

Should be easy to do a simple (and possibly misleading) test. Use your favorite scripting language to create a million tables, each having five or six columns.

like image 99
Mike Sherrill 'Cat Recall' Avatar answered Sep 20 '22 14:09

Mike Sherrill 'Cat Recall'


In general, according to those I know of who have used very large numbers of tables (in many thousands), planning overhead goes up as the number of tables in the db goes up. Those I have known who have had this as a problem have had to find solutions for this problem but have not specified to me what those solutions were. What happens is the database planner, in order to decide the best way to execute a query must look up information based on the tables and columns, so this requires searching for data in system catalogs that become more and more bloated over time. This impacts every query at plan time.

The basic issue is that when planning you have to be taking into account data on tables (requiring looking up stuff on tables) and columns, and columns. Interestingly pg_class has an index on oid and one on relnamespace, but not one on relname and you can't easily create one. The only indexes in the system tables are UNIQUE constraints and so I don't see how, other than altering the system catalogs (at the source level or giving you permission to do this) that you can solve this problem.

I would also expect performance to degrade slowly so you can't just put a hard limit on this. Consequently it depends on acceptable performance on a given workload.

If you have that many tables, I would look at seeing how many of them could be broken off into other databases first.

tl; dr: Expect performance issues with very large numbers of tables. Expect to have to be creative to resolve them.

like image 21
Chris Travers Avatar answered Sep 20 '22 14:09

Chris Travers