Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query plan on indexed partitioned table. Avoid sequential scan

Tags:

sql

postgresql

In a postgres 10.1 server I have a very big table partitioned by list value and a view which only filterthe table by the partition column.

When using the view, the planner is not giving me the best possible plan, i mean, scanning only the selected children tables. Instead it always scans all partitions of the parent table.

I have created a index by the partition column and a constraint tool. The DDL:


                                  Table "parted_mob_matrix"
    Column    |         Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
--------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
 id           | integer               |           | not null |         | plain    |              | 
 delivery_id  | integer               |           |          |         | Partition key: LIST (delivery_id)
Partitions: parted_mob_matrix_delivery_0 FOR VALUES IN (0),
            parted_mob_matrix_delivery_1 FOR VALUES IN (1),
            parted_mob_matrix_delivery_10 FOR VALUES IN (10),
            ....
            parted_mob_matrix_delivery_10 FOR VALUES IN (620),


                            Table "parted_mob_matrix_delivery_620"
    Column    |         Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
--------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
 id           | integer               |           | not null |         | plain    |              | 
 delivery_id  | integer               |           |          |         | plain    |              | 
Partition of: parted_mob_matrix FOR VALUES IN (620)
Partition constraint: ((delivery_id IS NOT NULL) AND (delivery_id = ANY (ARRAY[620])))
Indexes:
    "parted_mob_matrix_delivery_620_delivery_id_idx" btree (delivery_id)
Check constraints:
    "parted_mob_matrix_delivery_620_check_delivery" CHECK (delivery_id = 620)

Mi view code:

EXPLAIN SELECT
  parted_mob_matrix.*
FROM
  parted_mob_matrix
1) where parted_mob_matrix.delivery_id in (620)
2) where parted_mob_matrix.delivery_id in (select 620)

I need to use the 2 version here simplified (It's a real query to another very little table) but it plans very different and worse.

QUERY PLAN 1 (good on efficency):

Append  (cost=0.00..78308.11 rows=758031 width=738)

  ->  Seq Scan on parted_mob_matrix_delivery_620  (cost=0.00..78308.11 rows=758031 width=738)

        Filter: (delivery_id = 620)

QUERY PLAN 2 (rowset, slow):


Hash Semi Join  (cost=0.01..25077311.20 rows=7539693 width=860)

  Hash Cond: (parted_mob_matrix_delivery_0.delivery_id = (620))

  ->  Append  (cost=0.00..24942162.20 rows=211111399 width=859)

        ->  Seq Scan on parted_mob_matrix_delivery_0  (cost=0.00..10.75 rows=250 width=294)

        ->  Seq Scan on parted_mob_matrix_delivery_1  (cost=0.00..10.75 rows=250 width=294)

 -- All the child tables

        ->  Seq Scan on parted_mob_matrix_delivery_620  (cost=0.00..77929.09 rows=758031 width=738)

 -- All the child tables are scanned

How can I use the plan 1 on a query which a where like 2?

like image 457
Pablo Caro Avatar asked Jul 17 '19 17:07

Pablo Caro


People also ask

What is a clustered index scan in SQL Server?

Here we can see that this query is doing a Table Scan, so when a table has a Clustered Index it will do a Clustered Index Scan and when the table does not have a clustered index it will do a Table Scan. Since this table does not have a clustered index and there is not a WHERE clause SQL Server scans the entire table to return all rows.

How does SQL Server scan between Records in a partition?

Instead, on each partition, SQL Server starts from the first record with DateModified that has a value greater than @LastDateModified, and the scans the index to the end of partition.

Does SQL Server use statistics for indexing and scanning?

Bookmark this question. Show activity on this post. I know when it comes to using an index or a table scan, SQL Server uses statistics to see which one is better. I have a table with 20 million rows. I have an index on (SnapshotKey, Measure) and this query: The query returns 500k rows. So the query selects only 2.5% of the table's rows.

What are index scans and table scans?

One common problem that exists is the lack of indexes or incorrect indexes and therefore SQL Server has to process more data to find the records that meet the queries criteria. These issues are known as Index Scans and Table Scans.


1 Answers

You can solve your problem in PostgreSQL v10 wrapping the input of WHERE condition as an IMMUTABLE plpgsql function which returns an ARRAY of integers. By definition, an IMMUTABLE plpgsql function "(...) allows the optimizer to pre-evaluate the function when a query calls it with constant arguments (...)" (https://www.postgresql.org/docs/10/xfunc-volatility.html).

This solution should work.

Example:

SELECT
  parted_mob_matrix.*
FROM
  parted_mob_matrix
WHERE parted_mob_matrix.delivery_id = ANY(get_deliveries('cod_011'))

The function you could use:

CREATE OR REPLACE FUNCTION get_deliveries(
    high_level_id TEXT
)
RETURNS INTEGER[]
AS $BODY$
DECLARE
    _delivery_ids INTEGER[];
BEGIN
  EXECUTE format(
    $$
    SELECT ARRAY_AGG(delivery_id)
    FROM
        your_table_with_all_delivery_ids
    WHERE
        high_level_id = '%1$s'
    ;
    $$, high_level_id
  ) INTO _delivery_ids;
  RETURN _delivery_ids;
END;
$BODY$
LANGUAGE plpgsql IMMUTABLE;
like image 85
cayetano benavent Avatar answered Oct 21 '22 11:10

cayetano benavent