Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will this query load the whole table in memory

If I have a really big table will this query load the whole table in memory before it filters the resets:

with parent as
(
    select * from a101
)
select * from parent 
where value1 = 159

As you can see the parent query reference the whole table. Will this loaded in memory. This is a very simplified version of the query. The real query has a few joins to other tables. I am evaluating sql server 2012 and postgrsql.

like image 574
Luke101 Avatar asked Mar 25 '14 05:03

Luke101


People also ask

How much memory does a SQL query use?

SQL can be configured to only use a specific amount of RAM if necessary. Otherwise, it will consume the maximum amount of resources. This is the standard procedure after thoroughly investigating the server and confirming that SQL Server consumes 9GB of the server's total 10GB of RAM capacity.

Does SQL Run in memory?

However, many SQL Server workloads can fit their entire working set in available memory. Many in-memory database systems can persist data to disk and may not always be able to fit the entire data set in available memory.


2 Answers

In PostgreSQL (true as of 9.4, at least) CTEs act as optimisation fences.

The query optimiser will not flatten CTE terms into the outer query, push down qualifiers, or pull up qualifiers, even in trivial cases. So an unqualified SELECT inside a CTE term will always do a full table scan (or an index-only scan if there's a suitable index).

Thus, in PostgreSQL, these two things are very different indeed, as a simple EXPLAIN would show:

with parent as
(
    select * from a101
)
select * from parent 
where value1 = 159

and

SELECT *
FROM 
(
   SELECT * FROM a101
) AS parent
WHERE value1 = 159;

However, that "will scan the whole table" doesn't necessarily mean "will load the whole table in memory". PostgreSQL will use a TupleStore, which will transparently spill to a tempfile on disk as it gets larger.

The original justification was that DML in CTE terms was planned (and later implemented). If there's DML in a CTE term it's vital that its execution be predictable and complete. This may also be true if the CTE calls data-modifying functions.

Unfortunately, nobody seems to have thought "... but what if it's just a SELECT and we want to inline it?".

Many in the community appear to see this as a feature and regularly promulgate it as a workaround for optimiser issues. I find this attitude utterly perplexing. As a result, it's going to be really hard to fix this later, because people are intentionally using CTEs when they want to prevent the optimiser from altering a query.

In other words, PostgreSQL abuses CTEs as pseudo-query-hints (along with the OFFSET 0 hack), because project policy says real query hints aren't desired or supported.

AFAIK MS SQL Server may optimise CTE barriers, but may also choose to materialise a result set.

like image 185
Craig Ringer Avatar answered Sep 16 '22 23:09

Craig Ringer


I just made EXPLAIN for this query in PostgreSQL. Surprisingly it does sequence scan instead of index lookup:

 CTE Scan on parent  (cost=123.30..132.97 rows=2 width=1711)
   Filter: (value1 = 159)
   CTE parent
     ->  Seq Scan on a101  (cost=0.00..123.30 rows=430 width=2060)

I have a primary key index on value1 and it is used for simple select * from a101 where value1 = 159 query.

So, the answer is it will scan the whole table. I am surprised, I thought it will work as a view or subquery, but it does not. You can use this to use index:

select * from (select * from a101) parent 
where value1 = 159`
like image 35
Suor Avatar answered Sep 19 '22 23:09

Suor