I have created a dump of the database using pg_dump in "custom" format (-Fc). This format allows for pg_restore to be invoked with the "jobs" option (-j8). The jobs options starts 8 processes, and restores the vast majority of relations in my database within 10 minutes.
I'm left with 4 processes. One of them is the refresh of a materialized view, and the other 3 are indexes to be applied to 3 tables that the materialized view uses as data sources. The indexes are "waiting" according to pg_stat_activity, presumably because the REFRESH
of the materialized view is still accessing the source tables.
When the indexes are in place, the refresh of the view only takes a couple of minutes. Because the indexes are not in place during the REFRESH
, I cut the REFRESH
process off at 17 hours, which made pg_restore fail.
How can I
REFRESH MATERIALIZED VIEW
statement and throw it in the trashOr any other solution that gets the job done?
Unlike indexes, materialized views are not automatically updated with every data change. They must explicitly be refreshed, either on every commit, on a periodically time schedule or – typically in data warehouses – at the end of an ETL job.
Refreshes a materialized view. When you create a materialized view, its contents reflect the state of the underlying database table or tables at that time. The data in the materialized view remains unchanged, even when applications make changes to the data in the underlying tables.
Materialized views can be refreshed in two ways: fast or complete. A fast refresh requires having a materialized view log on the source tables that keeps track of all changes since the last refresh, so any new refresh only has changed (updated, new, deleted) data applied to the MV.
Refresh the materialized view without locking out concurrent selects on the materialized view. Without this option a refresh which affects a lot of rows will tend to use fewer resources and complete more quickly, but could block other connections which are trying to read from the materialized view.
David G Johnston posted an answer for me on the pgsql-hackers mailing list.
"Have/can you try the '-l (el) & -L' options to pg_restore?
http://www.postgresql.org/docs/9.3/static/app-pgrestore.html
(example of usage is toward the bottom of the page)
Basically re-order the command sequence so that the materialize runs as late as possible, or just disable it altogether.
pg_dump/pg_restore should be taught to handle this better, which is the main reason why Craig had you post here ASAP, but to get it functional for now manual intervention will be necessary. In theory the 'listing' capabilities should allow you to do what you need."
I think this (pg_restore -l | pg_restore -L) will get me where I need to go for now by inserting a small shell script in between that pushes the materialized views to the end of the list, but then I will also have to manage my own dependencies for the items that I re-sort (MatViews of MatViews). This pretty seriously limits the usefulness of materialized views for me. For version 9.3.x, I'm likely to require MatView dependencies no more than 1 deep.
Edit: To stop materializing the data on restore, I started doing this:
pg_dump mydatabase -Fd backup_dir pg_restore -l -Fd backup_dir | sed '/MATERIALIZED VIEW DATA/d' > ordered.lst pg_restore -L ordered.lst -Fd backup_dir mydatabase
This removes the REFRESH MATERIALIZED VIEW
statements from the restore. Thanks to David G Johnston for the tips.
As an addendum to the accepted answer, once all the indexes have finished and/or you've run ANALYZE, you can refresh the materialized views in the correct (dependency) order using:
pg_restore -l -Fd backup_dir | grep 'MATERIALIZED VIEW DATA' > refresh.lst pg_restore -L refresh.lst -Fd backup_dir mydatabase
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With