Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snowflake query pruning by Column

in the Snowflake Docs it says:

  1. First, prune micro-partitions that are not needed for the query.
  2. Then, prune by column within the remaining micro-partitions.

What is meant with the second step?

Let's take the example table t1 shown in the link. In this example table I use the following query:

SELECT * FROM t1
WHERE
Date = ‚11/3‘ AND
Name = ‚C‘ 

Because of the Date = ‚11/3‘ it would only scan micro partitions 2, 3 and 4. Because of the Name = 'C' it can prune even more and only scan micro-partions 2 and 4.

So in the end only micro-partitions 2 and 4 would be scanned.

But where does the second step come into play? What is meant with prune by column within the remaining micro partitions?

Does it mean, that only rows 4, 5 and 6 on micro-partition 2 and row 1 on micro-partition 4 are scanned, because date is my clustering key and is sorted so you can prune even further with the date?

So in the end only 4 rows would be scanned?

like image 601
Schnurres Avatar asked Jan 25 '23 05:01

Schnurres


2 Answers

But where does the second step come into play? What is meant with prune by column within the remaining micro partitions?

Benefits of Micro-partitioning:

Columns are stored independently within micro-partitions, often referred to as columnar storage.

This enables efficient scanning of individual columns; only the columns referenced by a query are scanned.

It is recommended to avoid SELECT * and specify required columns explicitly.

like image 106
Lukasz Szozda Avatar answered Mar 27 '23 10:03

Lukasz Szozda


It simply means to only select the columns that are required for the query. So in your example it would be:

SELECT col_1, col_2 FROM t1
WHERE
Date = ‚11/3‘ AND
Name = ‚C‘ 
like image 31
Simon D Avatar answered Mar 27 '23 10:03

Simon D