How many table partitions is too many in Postgres?

Tags:

I'm partitioning a very large table that contains temporal data, and considering to what granularity I should make the partitions. The Postgres partition documentation claims that "large numbers of partitions are likely to increase query planning time considerably" and recommends that partitioning be used with "up to perhaps a hundred" partitions.

Assuming my table holds ten years of data, if I partitioned by week I would end up with over 500 partitions. Before I rule this out, I'd like to better understand what impact partition quantity has on query planning time. Has anyone benchmarked this, or does anyone have an understanding of how this works internally?

421

asked May 24 '11 01:05

DNS

2 Answers

The query planner has to do a linear search of the constraint information for every partition of tables used in the query, to figure out which are actually involved--the ones that can have rows needed for the data requested. The number of query plans the planner considers grows exponentially as you join more tables. So the exact spot where that linear search adds up to enough time to be troubling really depends on query complexity. The more joins, the worse you will get hit by this. The "up to a hundred" figure came from noting that query planning time was adding up to a non-trivial amount of time even on simpler queries around that point. On web applications in particular, where latency of response time is important, that's a problem; thus the warning.

Can you support 500? Sure. But you are going to be searching every one of 500 check constraints for every query plan involving that table considered by the optimizer. If query planning time isn't a concern for you, then maybe you don't care. But most sites end up disliking the proportion of time spent on query planning with that many partitions, which is one reason why monthly partitioning is the standard for most data sets. You can easily store 10 years of data, partitioned monthly, before you start crossing over into where planning overhead starts to be noticeable.

200

answered Sep 28 '22 06:09

Greg Smith

"large numbers of partitions are likely to increase query planning time considerably" and recommends that partitioning be used with "up to perhaps a hundred" partitions.

Because every extra partition will usually be tied to check constraints, and this will lead the planner to wonder which of the partitions need to be queried against. In a best case scenario, the planner identifies that you're only hitting a single partition and gets rid of the append step altogether.

In terms of rows, and as DNS and Seth have pointed out, your milage will vary with the hardware. Generally speaking, though, there's no significant difference between querying a 1M row table and a 10M row table -- especially if your hard drives allow for fast random access and if it's clustered (see the cluster statement) using the index that you're most frequently hitting.

answered Sep 28 '22 08:09

Denis de Bernardy

Related questions
                            
                                How to speed up an AngularJS Application?
                            
                                JAXB vs DOM and SAX
                            
                                Java anonymous class efficiency implications
                            
                                Function to normalize any number from 0 - 1
                            
                                Why do people use jQuery for basic operations?
                            
                                Fast Arc Cos algorithm?
                            
                                C++ for Game Programming - Love or Distrust? [closed]
                            
                                Cast performance from size_t to double
                            
                                Performance of system.runtime.caching
                            
                                CPU Flame Graphs for Python
                            
                                Forcing GCC to perform loop unswitching of memcpy runtime size checks?
                            
                                UseConcMarkSweepGC vs UseParallelGC
                            
                                Does using an intermediate variable instead of array.length make your for-loop faster?
                            
                                Unexpectedly poor and weirdly bimodal performance for store loop on Intel Skylake
                            
                                Why are interface projections much slower than constructor projections and entity projections in Spring Data JPA with Hibernate?
                            
                                StorageFile 50 times slower than IsolatedStorageFile
                            
                                Fastest implementation of log2(int) and log2(float)
                            
                                What do you use to play sound in iPhone games?
                            
                                Best practice for storing tags in a database?
                            
                                Load files from one CDN or multiple CDNS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How many table partitions is too many in Postgres?

Tags:

performance

postgresql

partitioning

DNS

People also ask

2 Answers

Greg Smith

Denis de Bernardy

Recent Activity

Donate For Us