Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I want "live materialized views", with the latest info for any row

I saw this solution as an alternative to materialized views:

  • I want a "materialized view" of the latest records

But it's using the scheduled queries that run at most every 3 hours. My users are expecting live data, what can I do?

like image 929
Felipe Hoffa Avatar asked Oct 17 '22 11:10

Felipe Hoffa


1 Answers

2018-10: BigQuery doesn't support materialized views, but you can use this approach:

  • Use the previous solution to "materialize" a summary of the latest data, until the time that scheduled query ran.
  • Create a view that combines the materialized data, with a live view of the latest data on the append-only table.

Code would look like this:

CREATE OR REPLACE VIEW `wikipedia_vt.just_latest_rows_live` AS

SELECT latest_row.* 
FROM (
  SELECT ARRAY_AGG(a ORDER BY datehour DESC LIMIT 1)[OFFSET(0)] latest_row
  FROM (
    SELECT * FROM `fh-bigquery.wikipedia_vt.just_latest_rows`
    # previously "materialized" results
    UNION ALL 
    SELECT * FROM `fh-bigquery.wikipedia_v3.pageviews_2018`
    # append-only table, source of truth
    WHERE datehour > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY )

  ) a
  GROUP BY title
)

Note that BigQuery is able to use TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY ) to prune partitions effectively.

like image 198
Felipe Hoffa Avatar answered Oct 21 '22 08:10

Felipe Hoffa