Rewriting a legacy SQL script to DBT structures, I stumbled by UPDATE queries, which ran right after the table gets created and loaded with data.
Example: Within the SQL file create_dev_env.sql
...
CREATE TABLE A1 as
SELECT DISTINCT id, name FROM A
UPDATE A1
SET status = true
...
How would I run such process in a row: table creation, data loading and update data, dynamically on DBT.
I was told about 2 different approaches: 1. Post hooks and 2. Secondary stage levels.
Option 1 looks the one to go.
How would I allow dynamic SQL in a post-hook to update data in a table loaded previously?
You can do this with an update in a post-hook, and I'll get to how below. But I'd caution you against this approach, as it breaks the dbt paradigm. You need your transformations to be idempotent, so that you can easily build dbt models in many environments and get the same results when you run them over and over again (you really only get all the benefits of dbt when you embrace this paradigm). Most updates can be achieved by just editing the select statement that creates the model in the first place, and if that is possible, it's going to be much easier for you to maintain in the future.
I realize you simplified your example for this question, but to take your trivial example, your model could just look like this:
-- models/a1.sql
select distinct
id,
name,
true as status
from a
Ok, putting that aside, if you absolutely must run an update, you can use a macro to execute arbitrary sql and call that macro from a post-hook:
-- macros/update_a1.sql
{% macro update_a1() %}
update a1
set status = true
{% endmacro %}
Then in your model file:
-- models/a1.sql
{{ config(
materialized="table",
post_hook="{{ update_a1() }}"
) }}
select distinct id, name, status from a
Or if your example is really as simple as this, you don't need a macro and you can just inline the sql statement instead:
-- models/a1.sql
{{ config(
materialized="table",
post_hook="update a1 set status = true"
) }}
select distinct id, name, status from a
Note that this pattern is already broken -- we've hard-coded the identifier a1 into the update statement, so this will update prod even when run in dev. You could use this as a band-aid, since you want to update the current model. That will work in a macro or inline statement:
-- models/a1.sql
{{ config(
materialized="table",
post_hook="update {{ this }} set status = true"
) }}
select distinct id, name, status from a
As a final note, depending on your RDBMS, you may have to be careful about transaction handling, to make sure that your model is fully committed before your update runs, and that the update is committed so the changes don't get rolled back. More info here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With