In a postgres 10.1 server I have a very big table partitioned by list value and a view which only filterthe table by the partition column.
When using the view, the planner is not giving me the best possible plan, i mean, scanning only the selected children tables. Instead it always scans all partitions of the parent table.
I have created a index by the partition column and a constraint tool. The DDL:
Table "parted_mob_matrix"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | not null | | plain | |
delivery_id | integer | | | | Partition key: LIST (delivery_id)
Partitions: parted_mob_matrix_delivery_0 FOR VALUES IN (0),
parted_mob_matrix_delivery_1 FOR VALUES IN (1),
parted_mob_matrix_delivery_10 FOR VALUES IN (10),
....
parted_mob_matrix_delivery_10 FOR VALUES IN (620),
Table "parted_mob_matrix_delivery_620"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | not null | | plain | |
delivery_id | integer | | | | plain | |
Partition of: parted_mob_matrix FOR VALUES IN (620)
Partition constraint: ((delivery_id IS NOT NULL) AND (delivery_id = ANY (ARRAY[620])))
Indexes:
"parted_mob_matrix_delivery_620_delivery_id_idx" btree (delivery_id)
Check constraints:
"parted_mob_matrix_delivery_620_check_delivery" CHECK (delivery_id = 620)
Mi view code:
EXPLAIN SELECT
parted_mob_matrix.*
FROM
parted_mob_matrix
1) where parted_mob_matrix.delivery_id in (620)
2) where parted_mob_matrix.delivery_id in (select 620)
I need to use the 2
version here simplified (It's a real query to another very little table) but it plans very different and worse.
QUERY PLAN 1 (good on efficency):
Append (cost=0.00..78308.11 rows=758031 width=738)
-> Seq Scan on parted_mob_matrix_delivery_620 (cost=0.00..78308.11 rows=758031 width=738)
Filter: (delivery_id = 620)
QUERY PLAN 2 (rowset, slow):
Hash Semi Join (cost=0.01..25077311.20 rows=7539693 width=860)
Hash Cond: (parted_mob_matrix_delivery_0.delivery_id = (620))
-> Append (cost=0.00..24942162.20 rows=211111399 width=859)
-> Seq Scan on parted_mob_matrix_delivery_0 (cost=0.00..10.75 rows=250 width=294)
-> Seq Scan on parted_mob_matrix_delivery_1 (cost=0.00..10.75 rows=250 width=294)
-- All the child tables
-> Seq Scan on parted_mob_matrix_delivery_620 (cost=0.00..77929.09 rows=758031 width=738)
-- All the child tables are scanned
How can I use the plan 1
on a query which a where like 2
?
Here we can see that this query is doing a Table Scan, so when a table has a Clustered Index it will do a Clustered Index Scan and when the table does not have a clustered index it will do a Table Scan. Since this table does not have a clustered index and there is not a WHERE clause SQL Server scans the entire table to return all rows.
Instead, on each partition, SQL Server starts from the first record with DateModified that has a value greater than @LastDateModified, and the scans the index to the end of partition.
Bookmark this question. Show activity on this post. I know when it comes to using an index or a table scan, SQL Server uses statistics to see which one is better. I have a table with 20 million rows. I have an index on (SnapshotKey, Measure) and this query: The query returns 500k rows. So the query selects only 2.5% of the table's rows.
One common problem that exists is the lack of indexes or incorrect indexes and therefore SQL Server has to process more data to find the records that meet the queries criteria. These issues are known as Index Scans and Table Scans.
You can solve your problem in PostgreSQL v10 wrapping the input of WHERE condition as an IMMUTABLE plpgsql function which returns an ARRAY of integers. By definition, an IMMUTABLE plpgsql function "(...) allows the optimizer to pre-evaluate the function when a query calls it with constant arguments (...)" (https://www.postgresql.org/docs/10/xfunc-volatility.html).
This solution should work.
Example:
SELECT
parted_mob_matrix.*
FROM
parted_mob_matrix
WHERE parted_mob_matrix.delivery_id = ANY(get_deliveries('cod_011'))
The function you could use:
CREATE OR REPLACE FUNCTION get_deliveries(
high_level_id TEXT
)
RETURNS INTEGER[]
AS $BODY$
DECLARE
_delivery_ids INTEGER[];
BEGIN
EXECUTE format(
$$
SELECT ARRAY_AGG(delivery_id)
FROM
your_table_with_all_delivery_ids
WHERE
high_level_id = '%1$s'
;
$$, high_level_id
) INTO _delivery_ids;
RETURN _delivery_ids;
END;
$BODY$
LANGUAGE plpgsql IMMUTABLE;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With