Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Table vs Materialized View

This is related to the previous question I asked, Saved View with a timestamp expression, about storing information in a (non-materialized) View. How would the data be stored and then retrieved when a user does:

CREATED MATERIALIZED VIEW mv AS SELECT person_id, name, NOW() as now FROM table
# is this more-or-less the same as:
# CREATED TABLE tb AS SELECT person_id, name, NOW() as now FROM table
#  "AND UPDATE EVERY..."

Is the NOW() expression saved as a value to storage, or are any functions evaluated at query-time for a materialized view? Is a materialized view the same thing as a table, which has some sort of optimizations/refreshing done at the storage level, or am I missing the boat on that?

This post here suggests that (functionally speaking at least) a Materialized View can be emulated as a table with triggers: https://www.materialized.info/.

like image 945
David542 Avatar asked Oct 18 '20 22:10

David542


People also ask

What is difference between materialized view and table?

Materialized views are updated periodically based upon the query definition, table can not do this. Show activity on this post. A materialized view can be set up to refresh automatically on a periodic basis. A table may need additional code to truncate/reload data.

Is materialized view faster than table?

A materialized view pre-computes, stores, and maintains its data in a dedicated SQL pool just like a table. There's no recomputation needed each time a materialized view is used. That's why queries that use all or a subset of the data in materialized views can get faster performance.

Why do we use the materialized view instead of a table or views?

Materialized views are basically used to increase query performance since it contains results of a query. They should be used for reporting instead of a table for a faster execution.

When should I use a materialized view?

You can use materialized views to achieve one or more of the following goals: Ease Network Loads. Create a Mass Deployment Environment. Enable Data Subsetting.


4 Answers

This will depend on the database platform you are using. Seeing as you tagged this with 'oracle', here is what happens in Oracle.

The "now" value (in this case 'sysdate') is evaluated at materialized view instantiation time. This is easy to demonstrate

SQL> create materialized view MV as select e.*, sysdate d from emp e;

Materialized view created.

SQL> select * from mv;

     EMPNO ENAME      JOB              MGR HIREDATE                   SAL       COMM     DEPTNO D
---------- ---------- --------- ---------- ------------------- ---------- ---------- ---------- -------------------
      7369 SMITH      CLERK           7902 17/12/1980 00:00:00        800                    20 21/10/2020 12:18:26
      7499 ALLEN      SALESMAN        7698 20/02/1981 00:00:00       1600        300         30 21/10/2020 12:18:26
      7521 WARD       SALESMAN        7698 22/02/1981 00:00:00       1250        500         30 21/10/2020 12:18:26
      7566 JONES      MANAGER         7839 02/04/1981 00:00:00       2975                    20 21/10/2020 12:18:26
      7654 MARTIN     SALESMAN        7698 28/09/1981 00:00:00       1250       1400         30 21/10/2020 12:18:26
      7698 BLAKE      MANAGER         7839 01/05/1981 00:00:00       2850                    30 21/10/2020 12:18:26
      7782 CLARK      MANAGER         7839 09/06/1981 00:00:00       2450                    10 21/10/2020 12:18:26
      7788 SCOTT      ANALYST         7566 09/12/1982 00:00:00       3000                    20 21/10/2020 12:18:26
      7839 KING       PRESIDENT            17/11/1981 00:00:00       5000                    10 21/10/2020 12:18:26
      7844 TURNER     SALESMAN        7698 08/09/1981 00:00:00       1500                    30 21/10/2020 12:18:26
      7876 ADAMS      CLERK           7788 12/01/1983 00:00:00       1100                    20 21/10/2020 12:18:26
      7900 JAMES      CLERK           7698 03/12/1981 00:00:00        950                    30 21/10/2020 12:18:26
      7902 FORD       ANALYST         7566 03/12/1981 00:00:00       3000                    20 21/10/2020 12:18:26
      7934 MILLER     CLERK           7782 23/01/1982 00:00:00       1300                    10 21/10/2020 12:18:26

[wait 10 seconds]

14 rows selected.

SQL> select * from mv;

     EMPNO ENAME      JOB              MGR HIREDATE                   SAL       COMM     DEPTNO D
---------- ---------- --------- ---------- ------------------- ---------- ---------- ---------- -------------------
      7369 SMITH      CLERK           7902 17/12/1980 00:00:00        800                    20 21/10/2020 12:18:26
      7499 ALLEN      SALESMAN        7698 20/02/1981 00:00:00       1600        300         30 21/10/2020 12:18:26
      7521 WARD       SALESMAN        7698 22/02/1981 00:00:00       1250        500         30 21/10/2020 12:18:26
      7566 JONES      MANAGER         7839 02/04/1981 00:00:00       2975                    20 21/10/2020 12:18:26
      7654 MARTIN     SALESMAN        7698 28/09/1981 00:00:00       1250       1400         30 21/10/2020 12:18:26
      7698 BLAKE      MANAGER         7839 01/05/1981 00:00:00       2850                    30 21/10/2020 12:18:26
      7782 CLARK      MANAGER         7839 09/06/1981 00:00:00       2450                    10 21/10/2020 12:18:26
      7788 SCOTT      ANALYST         7566 09/12/1982 00:00:00       3000                    20 21/10/2020 12:18:26
      7839 KING       PRESIDENT            17/11/1981 00:00:00       5000                    10 21/10/2020 12:18:26
      7844 TURNER     SALESMAN        7698 08/09/1981 00:00:00       1500                    30 21/10/2020 12:18:26
      7876 ADAMS      CLERK           7788 12/01/1983 00:00:00       1100                    20 21/10/2020 12:18:26
      7900 JAMES      CLERK           7698 03/12/1981 00:00:00        950                    30 21/10/2020 12:18:26
      7902 FORD       ANALYST         7566 03/12/1981 00:00:00       3000                    20 21/10/2020 12:18:26
      7934 MILLER     CLERK           7782 23/01/1982 00:00:00       1300                    10 21/10/2020 12:18:26

14 rows selected.

The "D" column is unchanged no matter how times you query the materialized view. If I now issue a refresh command for the materialized view, we are in effect re-running the defining query, hence sysdate (and hence column "D") will be picked up as the moment of refresh.

SQL> exec dbms_mview.refresh('MV')

PL/SQL procedure successfully completed.

SQL> select * from mv;

     EMPNO ENAME      JOB              MGR HIREDATE                   SAL       COMM     DEPTNO D
---------- ---------- --------- ---------- ------------------- ---------- ---------- ---------- -------------------
      7369 SMITH      CLERK           7902 17/12/1980 00:00:00        800                    20 21/10/2020 12:19:12
      7499 ALLEN      SALESMAN        7698 20/02/1981 00:00:00       1600        300         30 21/10/2020 12:19:12
      7521 WARD       SALESMAN        7698 22/02/1981 00:00:00       1250        500         30 21/10/2020 12:19:12
      7566 JONES      MANAGER         7839 02/04/1981 00:00:00       2975                    20 21/10/2020 12:19:12
      7654 MARTIN     SALESMAN        7698 28/09/1981 00:00:00       1250       1400         30 21/10/2020 12:19:12
      7698 BLAKE      MANAGER         7839 01/05/1981 00:00:00       2850                    30 21/10/2020 12:19:12
      7782 CLARK      MANAGER         7839 09/06/1981 00:00:00       2450                    10 21/10/2020 12:19:12
      7788 SCOTT      ANALYST         7566 09/12/1982 00:00:00       3000                    20 21/10/2020 12:19:12
      7839 KING       PRESIDENT            17/11/1981 00:00:00       5000                    10 21/10/2020 12:19:12
      7844 TURNER     SALESMAN        7698 08/09/1981 00:00:00       1500                    30 21/10/2020 12:19:12
      7876 ADAMS      CLERK           7788 12/01/1983 00:00:00       1100                    20 21/10/2020 12:19:12
      7900 JAMES      CLERK           7698 03/12/1981 00:00:00        950                    30 21/10/2020 12:19:12
      7902 FORD       ANALYST         7566 03/12/1981 00:00:00       3000                    20 21/10/2020 12:19:12
      7934 MILLER     CLERK           7782 23/01/1982 00:00:00       1300                    10 21/10/2020 12:19:12

14 rows selected.

SQL>

But each platform may have its own characteristics.

Treatment of Oracle materialised views including definition and materialised view logs is covered here

https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/basic-materialized-views.html#GUID-A7AE8E5D-68A5-4519-81EB-252EAAF0ADFF

and some of the more advanced topics (partitioning, indexing, etc) of materialized views and how they can pertain to automatic query rewrite is here

https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/advanced-materialized-views.html#GUID-F7394DFE-7CF6-401C-A312-C36603BEB01B

like image 136
Connor McDonald Avatar answered Nov 14 '22 07:11

Connor McDonald


I have used Materialized views extensively in Oracle and I can answer from that point of view. I would imagine the general principle would be same for other databases as well with tiny variations.

First off, Materialized views are physical storage compared to normal views. Normal views stores only the query and execute the queries at run time. So, if you stored a current timestamp in Mat view it would show you the old value unless you refresh the view. In oracle, there are several refresh strategies for Materialized views as described below -

1. Manual Refresh: MView can be refreshed on demand by using the standard package dbms_snapshot.refresh_mview

2. Automatic Refresh: MView can be refreshed as soon as any changes are made in the table underlying the MView using “On Commit”

Automatic refreshes can be of various types -

2.1. Complete(Full) – Whenever the base table is modified, MView will be truncated first and will be loaded with the data. As the name suggests, it is refreshed completely.

2.2. Fast – Whenever the base table is modified, only the updated records are updated/inserted in the MView. We need “mvlog” file to make “Fast” refresh.

2.3. Force – It will first try to do the “Fast” refresh. If for some reasons, “Fast” refresh fails then it will do the “Complete” refresh.

So, structurally Materialized view is exactly similar to a physical table. Then the question is why do we create MV? & what's the advantage of creating a Materialized view after all? Well, here comes the interesting part.

Typically, when base tables contain large amount of data, it is expensive and time-consuming to compute the required aggregates or to compute joins between these tables. In such cases, queries can take minutes or even hours. So, in materialized views one can store precomputed aggregates and joins and when one tries to execute aggregate SQL queries or use those same joins in the queries, the database engine would actually not execute the query at runtime instead it will fetch the pre-computed results stored on Mat view and get back those results to the client, which is much faster as it prevents those computations to happen at runtime. So, how does the engine able to do that? - Oracle Database employs an extremely powerful process called query rewrite to quickly answer the query using materialized views.

A query undergoes several checks to determine whether it is a candidate for query rewrite. If the query fails any of the checks, then the query is applied to the detail tables rather than the materialized view. This can be costly in terms of response time and processing power.

The optimizer uses two different methods to recognize when to rewrite a query in terms of a materialized view. The first method is based on matching the SQL text of the query with the SQL text of the materialized view definition. If the first method fails, the optimizer uses the more general method in which it compares joins, selections, data columns, grouping columns, and aggregate functions between the query and materialized views.

This infact is a very useful application of Materialized view and can speed up queries 5X -100X times. I can give you a real life example, on one of the projects that I worked on, there were several aggregate reports which needed to be prepared weekly and if we used aggregate queries on top of data warehouse tables, it used to take hours for us to build those reports. Instead what we did later on was after every weekly batch of data warehouse load we used to do a full refresh of the Mat views (using the automatic refresh feature) say every Friday night and enable query re-write on those materialized views (having pre-aggregated/pre-computed results). Then our reporting queries used to be kick-off over the weekend through an automated process and due to this optimization of pre-aggerated results in Mat views with query re-write feature, our reports build time got a lot more faster and we were able to validate and deliver all the reports to the stakeholders by Monday morning without any hiccups whereas we were always on our toes when we had the previous process in place without Mat views, since the build used to take hours/sometimes days depending on the report. We similarly used the same process for non-aggregated reports as well which involved lots of joins with dimension and facts tables. So, in those Materialized views we used to store the data in a de-normalized format so that for building reports the joins need not be performed at run time. The pre-aggregation optimization yielded us 50x- 100x gains while the de-normalized materialized views used to give us anywhere between 5x - 30x depending on several factors.

You can read up the implementation & syntax level details in Oracle docs. Would be happy to provide more details if needed but just wanted to share a real-life 1st-hand Use-case to get the point across how useful it can be if we are able to make use of it in the right scenario. Below is a sample syntax in Oracle for reference.

CREATE MATERIALIZED VIEW department_mv
refresh complete on commit --Automatic Complete refresh (as described above)
enable query rewrite -- this feature enables optimization (as described above)
as
SELECT deptno, dept_name, SUM(salary)
FROM department
GROUP BY deptno, dept_name;
like image 39
Somy Avatar answered Nov 14 '22 06:11

Somy


In simple words: Materialized view in SQL is a physical constructs which is stored physically on the disc. But views are just logical constructs which are created where it is required in the query..

When you create a Materialized view, Oracle Database creates one internal table and at least one index, and may create one view, all in the schema of the materialized views.

Following is the syntax of materialized view:

Create materialized view View_Name

Build [Immediate/Deffered]

Refresh [Fast/Complete/Force]

on [Commit/Demand]

as Select ..........;

when a DML is executed in the Master table ,then oracle stores rows defining changes in the MV log and uses it to refresh the MV .Its called fast refresh.You can check the behaviour in mlog$_tablename. MV's are also called snapshots. using select name,table_name,refresh_method from user_snapshots, you can check the statuses.

To reflect the changes of master table in MV: execute DBMS_MVIEW.REFRESH('name_of the view');

for single snapshot: execute DBMS_SNAPSHOT.REFRESH( 'v_materialized_foo_tbl','f');

REFRESH procedure Below procedure refreshes a list of snapshots.

Syntax

DBMS_SNAPSHOT.REFRESH (
   { list                 IN     VARCHAR2,
   | tab                  IN OUT DBMS_UTILITY.UNCL_ARRAY,},...);

list: Comma-separated list of snapshots that you want to refresh. tab :A string of refresh methods indicating how to refresh the listed snapshots. F' or f' indicates fast refresh, ?' indicates force refresh, C' or c' indicates complete refresh, and A' or `a' indicates always refresh,

The picture below shows general idea: enter image description here

So, NOW()'s value is called when you do refreshes on it ,through execute DBMS_MVIEW.REFRESH() or DBMS_SNAPSHOT.REFRESH

Moreover using triggers on a MV is not a good practice though mysql can .Be it on read only MV. Oracle does not support a Execute command(to refresh MV) inside a trigger because trigger is itself a single transaction and exceute command commit in it makes it two commits inside one transaction.

Does an insert trigger need a commit statement

Triggers on a read only materialized view might appear to work - but you cannot rely on them working.

An update of a row in a materialized view might be done as a DELETE+INSERT.

A refresh of a materialized view might involve a delete + insert of every row.

A refresh might involve a truncate plus direct path load of ever row.

(the last two will probably happen sometime, you would lose anything your trigger had done in the past)"--- https://asktom.oracle.com/pls/apex/asktom.search%3Ftag%3Dtriggers-on-materialized-views#:~:text=Triggers%20on%20a%20read%20only,delete%20%2B%20insert%20of%20every%20row.

like image 24
nikhil sugandh Avatar answered Nov 14 '22 08:11

nikhil sugandh


The indexed view (or so called materialized view) can be changed with normal table and triggers.

You know that in PostgreSQL one needs to refresh it if the data of underlying table is changed. In Microsoft SQL Server an indexed view is refresh automatically, but there are many requirements to fulfill to create index on a view.

If you have two tables referred in a indexed view you will need to triggers on each table in order to use the alternative solution. Also, it's common that people write triggers which do not perform fast (for example a common mistake is one to use Row By Agonizing Row instead batch processing).

So, it seems that the indexed view is there for making our live easier but clearly it can be replace with a table and triggers to others table as the last solution is not facing the limitations of the first but it is more complicated.

For me, the PostgreSQL implementation is useless us I want to my statistics to be correct once a transaction is committed, not when a routine is called to refresh them.

I have used various indexed views in the context of SQL Server in order to optimize different cases. I have used triggers to precalculated data for the same purposes, too. I doubt there is definite answer which one to use. For me, if you can use an indexed view - use it and leave the engine to handle the difficult stuff. If you are facing some RDMS limitations - you have no choice but to use a triggers to precalculated the data.

In the context of SQL Server, the indexed view maintenance cost is similar to the cost of having an index.

like image 30
gotqn Avatar answered Nov 14 '22 06:11

gotqn