how cross column correlation is handled by PostgreSQL Optimizer

Question

There is often a relationship or correlation between the data stored in different columns of the same table. For example, in the customers table, the values in the c_state column are influenced by the values in the country_id column, as the state of XYZ is only going to be found in the country ABC.

I think PostgreSQL assumes predicates are mutually independent and selectivities for each predicate on the same relation are multiplied together. Hence selectivity estimates will be much smaller than actuals and non-optimal access paths may be chosen when data is highly dependent and skewed. How to avoid this in PostgreSQL?

Can we create some kind of multi-column statistics on a group of columns in PostgreSQL 9.3.5. Is there any support for Multi-dimensional histograms?

Craig Ringer · Accepted Answer

You're correct, Pg assumes independence, and this can be a problem when there's a correlation. The analyser doesn't know how to find correlations and the optimiser doesn't know how to use any such information even if the analyser could find it. There's no support for multi-dimensional histograms or storing cross-column correlation information.

There have been many discussions about this on the pgsql-hackers and pgsql-general lists, but no firm conclusions about how to deal with it have been reached. Additionally, almost nobody who has issues with this is willing to invest the time (or funding) into actually solving the problem.

Here's a relevant recent article. The optimizer hints wiki page also covers some correlation issues.

Some mailing list discussions (an extremely non-exhaustive list) include:

how to implement selectivity injection in postgresql - recent
Cross column correlation revisited
How to specify/mock the statistic data of tables in PostgreSQL

how cross column correlation is handled by PostgreSQL Optimizer

Tags:

sql

database

postgresql

postgresql-9.3

CRM

1 Answers

Craig Ringer

Recent Activity

Donate For Us

how cross column correlation is handled by PostgreSQL Optimizer

Tags:

sql

database

postgresql

postgresql-9.3

CRM

1 Answers

Craig Ringer

Related questions

Recent Activity

Donate For Us