<p>I'll need to invoke <code>REFRESH MATERIALIZED VIEW</code> on each change to the tables involved, right? I'm surprised to not find much discussion of this on the web.</p> <p>How should I go about doing this?</p> <p>I think the top half of the answer here is what I'm looking for: https://stackoverflow.com/a/23963969/168143</p> <p>Are there any dangers to this? If updating the view fails, will the transaction on the invoking update, insert, etc. be rolled back? (this is what I want... I think)</p>

<blockquote> <p>I'll need to invoke <code>REFRESH MATERIALIZED VIEW</code> on each change to the tables involved, right?</p> </blockquote> <p>Yes, PostgreSQL by itself will never call it automatically, you need to do it some way.</p> <blockquote> <p>How should I go about doing this?</p> </blockquote> <p>Many ways to achieve this. Before giving some examples, keep in mind that <code>REFRESH MATERIALIZED VIEW</code> command does block the view in AccessExclusive mode, so while it is working, you can't even do <code>SELECT</code> on the table.</p> <p>Although, if you are in version 9.4 or newer, you can give it the <code>CONCURRENTLY</code> option:</p> <pre class="prettyprint"><code>REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv; </code></pre> <p>This will acquire an ExclusiveLock, and will not block <code>SELECT</code> queries, but may have a bigger overhead (depends on the amount of data changed, if few rows have changed, then it might be faster). Although you still can't run two <code>REFRESH</code> commands concurrently.</p> <h3>Refresh manually</h3> <p>It is an option to consider. Specially in cases of data loading or batch updates (e.g. a system that only loads tons of information/data after long periods of time) it is common to have operations at end to modify or process the data, so you can simple include a <code>REFRESH</code> operation in the end of it.</p> <h3>Scheduling the REFRESH operation</h3> <p>The first and widely used option is to use some scheduling system to invoke the refresh, for instance, you could configure the like in a cron job:</p> <pre class="prettyprint"><code>*/30 * * * * psql -d your_database -c "REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv" </code></pre> <p>And then your materialized view will be refreshed at each 30 minutes.</p> <h3>Considerations</h3> <p>This option is really good, specially with <code>CONCURRENTLY</code> option, but only if you can accept the data not being 100% up to date all the time. Keep in mind, that even with or without <code>CONCURRENTLY</code>, the <code>REFRESH</code> command does need to run the entire query, so you have to take the time needed to run the inner query before considering the time to schedule the <code>REFRESH</code>.</p> <h3>Refreshing with a trigger</h3> <p>Another option is to call the <code>REFRESH MATERIALIZED VIEW</code> in a trigger function, like this:</p> <pre class="prettyprint"><code>CREATE OR REPLACE FUNCTION tg_refresh_my_mv() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv; RETURN NULL; END; $$; </code></pre> <p>Then, in any table that involves changes on the view, you do:</p> <pre class="prettyprint"><code>CREATE TRIGGER tg_refresh_my_mv AFTER INSERT OR UPDATE OR DELETE ON table_name FOR EACH STATEMENT EXECUTE PROCEDURE tg_refresh_my_mv(); </code></pre> <h3>Considerations</h3> <p>It has some critical pitfalls for performance and concurrency:</p> <ol> <li>Any INSERT/UPDATE/DELETE operation will have to execute the query (which is possible slow if you are considering MV);</li> <li>Even with <code>CONCURRENTLY</code>, one <code>REFRESH</code> still blocks another one, so any INSERT/UPDATE/DELETE on the involved tables will be serialized.</li> </ol> <p>The only situation I can think that as a good idea is if the changes are really rare.</p> <h3>Refresh using LISTEN/NOTIFY</h3> <p>The problem with the previous option is that it is synchronous and impose a big overhead at each operation. To ameliorate that, you can use a trigger like before, but that only calls a <code>NOTIFY</code> operation:</p> <pre class="prettyprint"><code>CREATE OR REPLACE FUNCTION tg_refresh_my_mv() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NOTIFY refresh_mv, 'my_mv'; RETURN NULL; END; $$; </code></pre> <p>So then you can build an application that keep connected and uses <code>LISTEN</code> operation to identify the need to call <code>REFRESH</code>. One nice project that you can use to test this is pgsidekick, with this project you can use shell script to do <code>LISTEN</code>, so you can schedule the <code>REFRESH</code> as:</p> <pre class="prettyprint"><code>pglisten --listen=refresh_mv --print0 | xargs -0 -n1 -I? psql -d your_database -c "REFRESH MATERIALIZED VIEW CONCURRENTLY ?;" </code></pre> <p>Or use <code>pglater</code> (also inside <code>pgsidekick</code>) to make sure you don't call <code>REFRESH</code> very often. For example, you can use the following trigger to make it <code>REFRESH</code>, but within 1 minute (60 seconds):</p> <pre class="prettyprint"><code>CREATE OR REPLACE FUNCTION tg_refresh_my_mv() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NOTIFY refresh_mv, '60 REFRESH MATERIALIZED VIEW CONCURRENLTY my_mv'; RETURN NULL; END; $$; </code></pre> <p>So it will not call <code>REFRESH</code> in less the 60 seconds apart, and also if you <code>NOTIFY</code> many times in less than 60 seconds, the <code>REFRESH</code> will be triggered only once.</p> <h3>Considerations</h3> <p>As the cron option, this one also is good only if you can bare with a little stale data, but this has the advantage that the <code>REFRESH</code> is called only when really needed, so you have less overhead, and also the data is updated more closer to when needed.</p> <p>OBS: I haven't really tried the codes and examples yet, so if someone finds a mistake, typo or tries it and works (or not), please let me know.</p>

How can I ensure that a materialized view is always up to date?

1 Answers

I'll need to invoke REFRESH MATERIALIZED VIEW on each change to the tables involved, right?

Yes, PostgreSQL by itself will never call it automatically, you need to do it some way.

How should I go about doing this?

Many ways to achieve this. Before giving some examples, keep in mind that REFRESH MATERIALIZED VIEW command does block the view in AccessExclusive mode, so while it is working, you can't even do SELECT on the table.

Although, if you are in version 9.4 or newer, you can give it the CONCURRENTLY option:

REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv;

This will acquire an ExclusiveLock, and will not block SELECT queries, but may have a bigger overhead (depends on the amount of data changed, if few rows have changed, then it might be faster). Although you still can't run two REFRESH commands concurrently.

Refresh manually

It is an option to consider. Specially in cases of data loading or batch updates (e.g. a system that only loads tons of information/data after long periods of time) it is common to have operations at end to modify or process the data, so you can simple include a REFRESH operation in the end of it.

Scheduling the REFRESH operation

The first and widely used option is to use some scheduling system to invoke the refresh, for instance, you could configure the like in a cron job:

*/30 * * * * psql -d your_database -c "REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv"

And then your materialized view will be refreshed at each 30 minutes.

Considerations

This option is really good, specially with CONCURRENTLY option, but only if you can accept the data not being 100% up to date all the time. Keep in mind, that even with or without CONCURRENTLY, the REFRESH command does need to run the entire query, so you have to take the time needed to run the inner query before considering the time to schedule the REFRESH.

Refreshing with a trigger

Another option is to call the REFRESH MATERIALIZED VIEW in a trigger function, like this:

CREATE OR REPLACE FUNCTION tg_refresh_my_mv() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN     REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv;     RETURN NULL; END; $$;

Then, in any table that involves changes on the view, you do:

CREATE TRIGGER tg_refresh_my_mv AFTER INSERT OR UPDATE OR DELETE ON table_name FOR EACH STATEMENT EXECUTE PROCEDURE tg_refresh_my_mv();

Considerations

It has some critical pitfalls for performance and concurrency:

Any INSERT/UPDATE/DELETE operation will have to execute the query (which is possible slow if you are considering MV);
Even with CONCURRENTLY, one REFRESH still blocks another one, so any INSERT/UPDATE/DELETE on the involved tables will be serialized.

The only situation I can think that as a good idea is if the changes are really rare.

Refresh using LISTEN/NOTIFY

The problem with the previous option is that it is synchronous and impose a big overhead at each operation. To ameliorate that, you can use a trigger like before, but that only calls a NOTIFY operation:

CREATE OR REPLACE FUNCTION tg_refresh_my_mv() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN     NOTIFY refresh_mv, 'my_mv';     RETURN NULL; END; $$;

So then you can build an application that keep connected and uses LISTEN operation to identify the need to call REFRESH. One nice project that you can use to test this is pgsidekick, with this project you can use shell script to do LISTEN, so you can schedule the REFRESH as:

pglisten --listen=refresh_mv --print0 | xargs -0 -n1 -I? psql -d your_database -c "REFRESH MATERIALIZED VIEW CONCURRENTLY ?;"

Or use pglater (also inside pgsidekick) to make sure you don't call REFRESH very often. For example, you can use the following trigger to make it REFRESH, but within 1 minute (60 seconds):

CREATE OR REPLACE FUNCTION tg_refresh_my_mv() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN     NOTIFY refresh_mv, '60 REFRESH MATERIALIZED VIEW CONCURRENLTY my_mv';     RETURN NULL; END; $$;

So it will not call REFRESH in less the 60 seconds apart, and also if you NOTIFY many times in less than 60 seconds, the REFRESH will be triggered only once.

Considerations

As the cron option, this one also is good only if you can bare with a little stale data, but this has the advantage that the REFRESH is called only when really needed, so you have less overhead, and also the data is updated more closer to when needed.

OBS: I haven't really tried the codes and examples yet, so if someone finds a mistake, typo or tries it and works (or not), please let me know.

142

answered Oct 13 '22 07:10

MatheusOl

Related questions
                            
                                How to update selected rows with values from a CSV file in Postgres?
                            
                                Completely copying a postgres table with SQL
                            
                                SQLAlchemy support of Postgres Schemas
                            
                                Import MySQL dump to PostgreSQL database
                            
                                How to Allow Remote Access to PostgreSQL database
                            
                                Psycopg2 image not found
                            
                                Are PostgreSQL functions transactional?
                            
                                PostgreSQL how to see which queries have run
                            
                                Discard millisecond part from timestamp
                            
                                How do you add PostgreSQL Driver as a dependency in Maven?
                            
                                How to find pg_config path
                            
                                good postgresql client for windows? [closed]
                            
                                pg_dump postgres database from remote server when port 5432 is blocked
                            
                                Postgres FOR LOOP
                            
                                Create or replace trigger postgres
                            
                                "psql: could not connect to server: Connection refused" Error when connecting to remote database
                            
                                Best way to delete millions of rows by ID
                            
                                How do I import modules or install extensions in PostgreSQL 9.1+?
                            
                                CURRENT_TIMESTAMP in milliseconds
                            
                                How to get First and Last record from a sql query?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I ensure that a materialized view is always up to date?

Tags:

postgresql

materialized-views

John Bachir

People also ask

1 Answers

Refresh manually

Scheduling the REFRESH operation

Considerations

Refreshing with a trigger

Considerations

Refresh using LISTEN/NOTIFY

Considerations

MatheusOl

Recent Activity

Donate For Us