Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Snowflake support indexes?

In the Snowflake documentation, I could not find a reference to using Indexes.

Does Snowflake support Indexes and, if not, what is the alternative approach to performance tuning when using Snowflake?

like image 719
Koustav Ponda Avatar asked Oct 21 '19 18:10

Koustav Ponda


People also ask

Does Snowflake support table partitioning?

Micro-partitioning is automatically performed on all Snowflake tables. Tables are transparently partitioned using the ordering of the data as it is inserted/loaded.

Is Snowflake a structured database?

All data in Snowflake is stored in database tables, logically structured as collections of columns and rows.

How do you drop a Snowflake index?

As previously mentioned, Snowflake doesn't support the concept of indices. As a substitute in particular situations, you can use clustering keys. Creation of clustering keys is explained here, and this article will show you how to drop a clustering key for a particular table.

Is Snowflake column oriented?

Common column oriented databases: Redshift. BigQuery. Snowflake.


1 Answers

Snowflake does not use indexes. This is one of the things that makes Snowflake scale so well for arbitrary queries. Instead, Snowflake calculates statistics about columns and records in files that you load, and uses those statistics to figure out what parts of what tables/records to actually load to execute a query. It also uses a columnar store file format, that lets it only read the parts of the table that contain the fields (columns) you actually use, and thus cut down on I/O on columns that you don't use in the query.

Snowflake slices big tables (gigabyte, terabyte or larger) into smaller "micro partitions." For each micro partition, it collects statistics about what value ranges each column contains. Then, it only loads micro partitions that contain values in the range needed by your query. As an example, let's say you have a column of time stamps. If your query asks for data between June 1 and July 1, then partitions that do not contain any data in this range, will not be loaded or processed, based on the statistics stored for dates in the micropartition files.

Indexes are often used for online transaction processing, because they accelerate workflows when you work with one or a few records, but when you run analytics queries on large datasets, you almost always work with large subsets of each table in your joins and aggregates. The storage mechanism, with automatic statistics, automatically accelerates such large queries, with no need for you to specify an index, or tune any kind of parameters.

like image 75
Jon Watte Avatar answered Dec 21 '22 22:12

Jon Watte