Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Materialized View vs Table Using dbt

I'm just onboarding dbt and having gone through the tutorial docs I'm wondering if there's a difference between materializing my transformations as views or tables? I'm using Snowflake as the data warehouse. There's some documentation here that shows the differences between a table and a materialized view but if I'm using dbt to update the tables regularly, do they more or less become the same thing?

Thanks!

like image 750
Kyle Cheung Avatar asked Oct 22 '20 20:10

Kyle Cheung


2 Answers

dbt doesn't support materialized views, as far as I'm aware, but as Felipe commented, there is an open issue to discuss it. If it were possible to use materialized views on Snowflake, you're right that they somewhat become the same thing. The materialized view would update even if you haven't run dbt. As Drew mentions in the ticket though, there are a lot of caveats that make using tables with dbt preferable in most use cases: "no window functions, no unions, limited aggregates, can't query views, etc etc etc".

That said, dbt does support views and tables.

Even when you're using dbt, there's still a difference between a view and a table. A table will always need to be refreshed by dbt in order to be updated. A view will always be as up-to-date as the underlying tables it is referencing.

For example, let's say you have a dbt model called fct_orders which references a table that is loaded by Fivetran/Stitch called shopify.order. If your model is materialized as a view, it will always return the most up-to-date data in the Shopify table. If it is materialized as a table, and new data has arrived in the Shopify table since you last run dbt, the model will be 'stale'.

That said, the benefit of materializing it as a table is that it will run more quickly, given it's not having to do the SQL 'transformation' each time.

The advice I have seen given most often is something like this:

  • If using a view isn't too slow for your end-users, use a view.
  • If a view gets too slow for your end-users, use a table.
  • If building a table with dbt gets too slow, use incremental models in dbt.
like image 86
dylanbaker Avatar answered Sep 27 '22 19:09

dylanbaker


If you use DBT there's little need for materialized views: a materialized view is in fact a table which is based on a query - same as "create table as select". If you have a DBT model you can materialize as a table and you'll get the same result. Now the difference between a table and a materialized view is the fact that the materialized view automatically updates, while the table does not. But if you're using DBT you can schedule a refresh of the table by scheduling DBT.

This will only give you updated data after your scheduled DBT will complete, which is not the same as a materialized view if the underlying table changes frequently, but most people refrain from using materialized views on top of tables that change frequently because the running cost can get out of control.

Materialized views in Snowflake can only query one table, while with DBT there are more options - e.g. join two tables and materialize as a table will give you something you can't do with a materialized view.

Finally, if you really want to deploy materialized views with DBT there are two ways:

  1. Use the pre-hook or the post-hook, which executes any piece of SQL after running the DBT model. That can work but the maintenance is not great.
  2. There is a way to create your own materialization - see https://docs.getdbt.com/docs/guides/creating-new-materializations - this is not an easy task, but that will give you want you want. There's also a GitHub page called dbt-hack which gives interesting techniques on non-standard materializations.
like image 39
JeromeE Avatar answered Sep 27 '22 19:09

JeromeE