Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Schema versioning within the Snowflake data warehouse

I am interested in ways in which users of snowflake database can be insulated from change via the use of schema versioning. I have been investigating the use of connection syntax to define a schema where a new schema holding views to the core tables would be created for each release, any views unchanged would be copied others which were amended would be made backwards compatible. As users connect they would ideally be given the correct connection syntax for the version they required.

The problem I have is that there are multiple teams each owning schemas associated with a core business area and I don't think it is possible to define multiple schemas in the connection syntax.

Has anyone achieved this in an environment with multiple users, schemas and development teams?

Regards,

Luke

like image 704
LukeM Avatar asked Dec 10 '19 16:12

LukeM


People also ask

What is the purpose of schema versioning?

Schema versioning deals with the need to retain current data, and the ability to query and update it, through alternate database structures.

What is snowflake schema in data warehouse?

A snowflake schema is a multi-dimensional data model that is an extension of a star schema, where dimension tables are broken down into subdimensions. Snowflake schemas are commonly used for business intelligence and reporting in OLAP data warehouses, data marts, and relational databases.

Is snowflake schema normalized or denormalized?

The snowflake schema is a fully normalized data structure. Dimensional hierarchies (such as city > country > region) are stored in separate dimensional tables. On the other hand, star schema dimensions are denormalized.

Which of the following are the default schemas that comes with any database in Snowflake?

Default: Depends on whether the session currently has a database in use: Database: DATABASE is the default (i.e. the command returns the objects you have privileges to view in the database). No database: ACCOUNT is the default (i.e. the command returns the objects you have privileges to view in your account).


1 Answers

We use Alembic for our Database version control for Snowflake. Alembic is a "migration" tool where you can run multiple changes (or a migration) to your Data Warehouse. It's essentially an add-on to the SQLAlchemy library in Python.

When developing locally, we create a clone of our database, and test our migration changes to the cloned database. Once we know it works, we push it to GitLab, get it approved, then we can run a CI/CD pipeline that has accountadmin credentials to make the change in production.

Since it's written in Python, you can connect this to your Git tool (like GitHub or GitLab) and submit changes in a Merge Request and get approval before running this in your Production database.

Here's the documentation: https://alembic.sqlalchemy.org/en/latest/

This is also officially supported according to Snowflake documentation: https://docs.snowflake.net/manuals/user-guide/sqlalchemy.html#alembic-support

An example Alembic migration might look like:


Revision ID: 78a3acc7fbb2
Revises: 3f2ee8d809a6
Create Date: 2019-11-06 11:40:38.438468

"""

# revision identifiers, used by Alembic.
revision = '78a3acc7fbb2'
down_revision = '3f2ee8d809a6'
branch_labels = None
depends_on = None

from alembic import op
import sqlalchemy as sa

def upgrade():
    op.create_table('test_table',
    sa.Column('op', sa.String(length=255), nullable=True),
    sa.Column('id', sa.String(length=255), nullable=False),
    sa.Column('amount', sa.BigInteger(), nullable=True),
    sa.Column('reason', sa.String(length=255), nullable=True),
    sa.Column('deleted', sa.Boolean(), nullable=True),
    sa.Column('user_id', sa.Integer(), nullable=True),
    sa.Column('company_id', sa.Integer(), nullable=True),
    sa.Column('inserted_at', sa.DateTime(), nullable=True),
    sa.Column('updated_at', sa.DateTime(), nullable=True),
    sa.Column('dw_import_filename', sa.String(length=255), nullable=True),
    sa.Column('dw_import_file_row_number', sa.Integer(), nullable=True),
    sa.Column('dw_import_timestamp', sa.TIMESTAMP(), nullable=True),
    sa.PrimaryKeyConstraint('id'),
    schema='test_schema'
    )

def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_table('test_table', schema='test_schema')

As you can see, you have to supply an upgrade and have the ability to downgrade, which reverses the upgrade. If you have any other questions about Alembic or if this interests you then I'd be happy to explain more.

like image 162
Brock Avatar answered Dec 25 '22 14:12

Brock