Suppose I have 2 tables A
and B
. I create a MV
(materialized view) with a join
query of two tables, psuedo like:
create materialized view a_b engine = Memory as
select * from(
select * from A
) all inner join (
select * from B
) using some_col;
I known that a_b
is only updated when inserting data into table A
and nothing else happen when push data to B
. I want my MV have to update when both table are updated.
My workaround is to create another MV
that change postition of A
, B
and point to a_b
like
create materialized view a_b_2 to a_b as
select * from(
select * from B
) all inner join (
select * from A
) using same_col;
I have some questions about this approach:
1. Are there any more legal way
to archive same effect in clickhouse
?
2. Suppose I have 2 incoming batches data BD_A
and BD_B
are going to insert to A
and B
simultaneously. Some data of 2 batches themself (BD_A_B
) is fit join
condition . Is there any chance that the MV
lost those BD_A_B
because MV a_b
processes BD_A
with before-inserted B
and MV a_b_2
processes BD_B
with before-inserted A
.
One of the most common follow-on questions we receive is whether materialized views can support joins. The answer is emphatically yes. This blog article shows how.
To update the data in a materialized view, you can use the REFRESH MATERIALIZED VIEW statement at any time. When you use this statement, Amazon Redshift identifies changes that have taken place in the base table or tables, and then applies those changes to the materialized view.
To refresh the Oracle materialized view, call one of the procedures in DBMS_MVIEW. The DBMS_MVIEW package provides three types of refresh operations: DBMS_MVIEW. REFRESH: Refreshes one or more Oracle materialized views.
Unlike indexes, materialized views are not automatically updated with every data change. They must explicitly be refreshed, either on every commit, on a periodically time schedule or – typically in data warehouses – at the end of an ETL job.
As far as I understand, you are trying to have a workaround of a limitation.
Clickhouse does not support multiple source tables for a MV and they have quite good reasons for this. I actually asked this to devs and got this answer:
In ClickHouse materialized view behaves more like BEFORE INSERT TRIGGER, each time processing new block arrived with insert.
So that is quite natural limitation as inserts to 2 different table will come asynchronously and you usually expect to see in JOINs whole table not only newly arrived blocks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With