I have situation, where running a query that filters by an indexed column in a partitioned table, performs a full table scan.
Apparently , this is a known issue in postgresql, and it's explained in detail here.
Is there a more elegant way around this other than performing a query on each partition, and then performing a UNION on all of the results?
We can CREATE INDEX on the partitioned table in PostgreSQL, which will automatically create indexes on each existing child table. Also, the index is created on any new future partition created. Above, we have created partitioned tables up to 3 levels and create indexes on the partitioned tables of different levels.
Partitioned tables can have indexes that are nonpartitioned (existing in a single table space within a database partition), indexes that are themselves partitioned across one or more table spaces within a database partition, or a combination of the two.
A partitioned index is made up of a set of index partitions, each of which contains the index entries for a single data partition. Each index partition contains references only to data in its corresponding data partition. Both system- and user-generated indexes can be partitioned.
Indexes work just fine to do a scan only of the relevant partitions in PostgreSQL. But, you have to set everything up properly for it to work, and it's easy to miss a step in the long list of things documented at http://www.postgresql.org/docs/current/static/ddl-partitioning.html
The main thing to realize is that in order to avoid a sequential scan, you have to provide enough information to PostgreSQL so it can prove some partitions cannot have the data you're looking for; then they are skipped as potential sources for the query results. The article you link to points this out as a solution to the seq scan problem: "If you add range constraints to the date field of each partition, this query can be optimized into a loop where you query the “latest” partition first and work backwards until you find a single value that is higher than the range of all the remaining partitions."--but doesn't show the improved plan you'd see after that change.
Some common mistakes you might have made:
-The constraint_exclusion parameter in the postgresql.conf file is off by default. With that default, you won't get what you expect.
-Didn't create non-overlapping partitions using CHECK, which keeps the planner from knowing what's inside each of them. It's possible to miss this step but still get your data into the right partitions properly, the planner just won't know that.
-Did not put an index on each partition, only created one on the master table. This will give you a sequential scan just on the relevant partition, so not as bad as the above but not good either.
There's some work to make this all easier in upcoming PostgreSQL releases (setting constraint_partition is fairly automatic in 8.4 and some sort of partition setup automation is being worked in). Right now, if you follow the instructions carefully and avoid all these problems, it should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With