Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I prevent materialized views from refreshing during pg_restore?

Tags:

I have created a dump of the database using pg_dump in "custom" format (-Fc). This format allows for pg_restore to be invoked with the "jobs" option (-j8). The jobs options starts 8 processes, and restores the vast majority of relations in my database within 10 minutes.

I'm left with 4 processes. One of them is the refresh of a materialized view, and the other 3 are indexes to be applied to 3 tables that the materialized view uses as data sources. The indexes are "waiting" according to pg_stat_activity, presumably because the REFRESH of the materialized view is still accessing the source tables.

When the indexes are in place, the refresh of the view only takes a couple of minutes. Because the indexes are not in place during the REFRESH, I cut the REFRESH process off at 17 hours, which made pg_restore fail.

How can I

  1. Force the order of items so the indexes get created first
  2. Turn off the refresh of the materialized view and do it manually later
  3. Manipulate the dump file in custom format to say "WITH NO DATA"
  4. Intercept the REFRESH MATERIALIZED VIEW statement and throw it in the trash

Or any other solution that gets the job done?

like image 321
Kirk Roybal Avatar asked Jun 25 '14 15:06

Kirk Roybal


People also ask

Are materialized views automatically refreshed?

Unlike indexes, materialized views are not automatically updated with every data change. They must explicitly be refreshed, either on every commit, on a periodically time schedule or – typically in data warehouses – at the end of an ETL job.

What happens when materialized view is refreshed?

Refreshes a materialized view. When you create a materialized view, its contents reflect the state of the underlying database table or tables at that time. The data in the materialized view remains unchanged, even when applications make changes to the data in the underlying tables.

How do materialized views get refreshed?

Materialized views can be refreshed in two ways: fast or complete. A fast refresh requires having a materialized view log on the source tables that keeps track of all changes since the last refresh, so any new refresh only has changed (updated, new, deleted) data applied to the MV.

Why would you refresh a materialized view in a PostgreSQL database before you use it?

Refresh the materialized view without locking out concurrent selects on the materialized view. Without this option a refresh which affects a lot of rows will tend to use fewer resources and complete more quickly, but could block other connections which are trying to read from the materialized view.


2 Answers

David G Johnston posted an answer for me on the pgsql-hackers mailing list.

"Have/can you try the '-l (el) & -L' options to pg_restore?

http://www.postgresql.org/docs/9.3/static/app-pgrestore.html

(example of usage is toward the bottom of the page)

Basically re-order the command sequence so that the materialize runs as late as possible, or just disable it altogether.

pg_dump/pg_restore should be taught to handle this better, which is the main reason why Craig had you post here ASAP, but to get it functional for now manual intervention will be necessary. In theory the 'listing' capabilities should allow you to do what you need."

I think this (pg_restore -l | pg_restore -L) will get me where I need to go for now by inserting a small shell script in between that pushes the materialized views to the end of the list, but then I will also have to manage my own dependencies for the items that I re-sort (MatViews of MatViews). This pretty seriously limits the usefulness of materialized views for me. For version 9.3.x, I'm likely to require MatView dependencies no more than 1 deep.

Edit: To stop materializing the data on restore, I started doing this:

pg_dump mydatabase -Fd backup_dir pg_restore -l  -Fd backup_dir | sed '/MATERIALIZED VIEW DATA/d' > ordered.lst pg_restore -L ordered.lst -Fd backup_dir mydatabase 

This removes the REFRESH MATERIALIZED VIEW statements from the restore. Thanks to David G Johnston for the tips.

like image 94
Kirk Roybal Avatar answered Sep 23 '22 15:09

Kirk Roybal


As an addendum to the accepted answer, once all the indexes have finished and/or you've run ANALYZE, you can refresh the materialized views in the correct (dependency) order using:

pg_restore -l -Fd backup_dir | grep 'MATERIALIZED VIEW DATA' > refresh.lst pg_restore -L refresh.lst -Fd backup_dir mydatabase 
like image 38
Jiri Baum Avatar answered Sep 23 '22 15:09

Jiri Baum