Cannot query over table without a filter that can be used for partition elimination

Tags:

google-bigquery

I have a partitioned table and would love to use a MERGE statement, but for some reason doesn't work out.

MERGE `wr_live.p_email_event`  t
using `wr_live.email_event` s
on t.user_id=s.user_id and t.event=s.event and t.timestamp=s.timestamp
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
values (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)

I get

Cannot query over table 'wr_live.p_email_event' without a filter that can be used for partition elimination.

What's the proper syntax? Also is there a way I can express shorter the insert stuff? without naming all columns?

821

asked Jul 05 '18 14:07

1 Answers

What's the proper syntax?

As you can see from error message - your partitioned wr_live.p_email_event table was created with require partition filter set to true. This mean that any query over this table must have some filter on respective partitioned field

Assuming that timestamp IS that partitioned field - you can do something like below

MERGE `wr_live.p_email_event`  t
USING `wr_live.email_event` s
ON t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > CURRENT_DATE()  -- this is the filter you should tune 
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
VALUES (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)

So you need to make below line such that it in reality does not filter out whatever you need to be involved

AND DATE(t.timestamp) <> CURRENT_DATE()  -- this is the filter you should tune

For example, I found, setting it to timestamp in future - in many cases addresses the issue, like

AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)

Of course, if your wr_live.email_event table also partitioned with require partition filter set to true - you need to add same filter for s.timestamp

Also is there a way I can express shorter the insert stuff? without naming all columns?

BigQuery DML's INSERT requires column names to be specified - there is no way (at least that I am aware of) to avoid it using INSERT statement
Meantime, you can avoid this by using DDL's CREATE TABLE from the result of the query. This will not require listing the columns

For example, something like below

CREATE OR REPLACE TABLE `wr_live.p_email_event`
PARTITION BY DATE(timestamp) AS
SELECT * FROM `wr_live.p_email_event` 
WHERE DATE(timestamp) <> DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
UNION ALL 
SELECT * FROM `wr_live.email_event` s
WHERE NOT EXISTS (
  SELECT 1 FROM `wr_live.p_email_event` t
  WHERE t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
  AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
)

You might also want to include table options list via OPTIONS() - but looks like filter attribute is not supported yet - so if you do have/need it - above will "erase" this attribute :o(

answered Oct 19 '22 08:10

Mikhail Berlyant

Related questions
                            
                                Export Google BigQuery data to Python Pandas dataframe
                            
                                BigQuery API limit exceeded error
                            
                                BigQuery select multiple key values
                            
                                Apps Script, convert a Sheet range to Blob
                            
                                Need help formatting datetime timezone for Google API
                            
                                How to catch any exceptions thrown by BigQueryIO.Write and rescue the data which is failed to output?
                            
                                BigQuery Standard SQL: Delete Duplicates from Table
                            
                                Python Unit Testing Google Bigquery
                            
                                Resources exceeded BigQuery
                            
                                Unable to use json body of gcp cloud scheduler in cloud function as parameter value?
                            
                                Obtaining BigQuery data from JavaScript code
                            
                                BigQuery Subtract Counts of Two Tables?
                            
                                How to use bigquery correlation based on many columns?
                            
                                How to scale Pivoting in BigQuery?
                            
                                SHA-256 BigQuery function or UDF
                            
                                How to change default Options in BigQuery console (Web UI), especially uncheck "Use Legacy SQL"?
                            
                                Bigquery: Partitioning data past 2000 limit (Update: Now 4000 limit) [duplicate]
                            
                                Convert Bigquery results to Pandas Data Frame
                            
                                Are some bigquery public datasets no longer available?
                            
                                Airflow BigQueryOperator: how to save query result in a partitioned Table?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cannot query over table without a filter that can be used for partition elimination

Tags:

google-bigquery

Pentium10

People also ask

1 Answers

Mikhail Berlyant

Recent Activity

Donate For Us