If I have a really big table will this query load the whole table in memory before it filters the resets:
with parent as
(
select * from a101
)
select * from parent
where value1 = 159
As you can see the parent query reference the whole table. Will this loaded in memory. This is a very simplified version of the query. The real query has a few joins to other tables. I am evaluating sql server 2012 and postgrsql.
SQL can be configured to only use a specific amount of RAM if necessary. Otherwise, it will consume the maximum amount of resources. This is the standard procedure after thoroughly investigating the server and confirming that SQL Server consumes 9GB of the server's total 10GB of RAM capacity.
However, many SQL Server workloads can fit their entire working set in available memory. Many in-memory database systems can persist data to disk and may not always be able to fit the entire data set in available memory.
In PostgreSQL (true as of 9.4, at least) CTEs act as optimisation fences.
The query optimiser will not flatten CTE terms into the outer query, push down qualifiers, or pull up qualifiers, even in trivial cases. So an unqualified SELECT
inside a CTE term will always do a full table scan (or an index-only scan if there's a suitable index).
Thus, in PostgreSQL, these two things are very different indeed, as a simple EXPLAIN
would show:
with parent as
(
select * from a101
)
select * from parent
where value1 = 159
and
SELECT *
FROM
(
SELECT * FROM a101
) AS parent
WHERE value1 = 159;
However, that "will scan the whole table" doesn't necessarily mean "will load the whole table in memory". PostgreSQL will use a TupleStore, which will transparently spill to a tempfile on disk as it gets larger.
The original justification was that DML in CTE terms was planned (and later implemented). If there's DML in a CTE term it's vital that its execution be predictable and complete. This may also be true if the CTE calls data-modifying functions.
Unfortunately, nobody seems to have thought "... but what if it's just a SELECT and we want to inline it?".
Many in the community appear to see this as a feature and regularly promulgate it as a workaround for optimiser issues. I find this attitude utterly perplexing. As a result, it's going to be really hard to fix this later, because people are intentionally using CTEs when they want to prevent the optimiser from altering a query.
In other words, PostgreSQL abuses CTEs as pseudo-query-hints (along with the OFFSET 0
hack), because project policy says real query hints aren't desired or supported.
AFAIK MS SQL Server may optimise CTE barriers, but may also choose to materialise a result set.
I just made EXPLAIN
for this query in PostgreSQL. Surprisingly it does sequence scan instead of index lookup:
CTE Scan on parent (cost=123.30..132.97 rows=2 width=1711)
Filter: (value1 = 159)
CTE parent
-> Seq Scan on a101 (cost=0.00..123.30 rows=430 width=2060)
I have a primary key index on value1
and it is used for simple select * from a101 where value1 = 159
query.
So, the answer is it will scan the whole table. I am surprised, I thought it will work as a view or subquery, but it does not. You can use this to use index:
select * from (select * from a101) parent
where value1 = 159`
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With