I'm scheduling a query to run every day that picks up data from a table that get's written to Google BigQuery automatically. Usually the table is there, but I'd like to be sure before I execute the query depending on this table.
I'm looking into other ways as well, but the simplest way seems to be to use just SQL and integrate the checking and retrying into the scheduled query. I'm able to check if the table exists and retry if it doesn't. I can't seem to find a way to not do this immediately and have the query wait for 30 minutes before the next retry. Is there something available, possibly similar to 'WAITFOR' that will achieve this?
Current SQL;
DECLARE retry_count INT64;
DECLARE success BOOL;
DECLARE size_bytes INT64;
DECLARE row_count INT64;
SET retry_count = 1;
SET success = FALSE;
WHILE retry_count <= 3 AND success = FALSE DO
BEGIN
SET row_count = (SELECT row_count FROM [DATASET].__TABLES__ WHERE table_id='[TABLE]');
IF row_count > 0 THEN
SELECT 'Table Exists!' as message, retry_count as retries;
SET success = TRUE;
ELSE
SELECT 'Table does not exist' as message, retry_count as retries, row_count;
SET retry_count = retry_count + 1;
-- WAITFOR DELAY '00:30:00';
END IF;
END;
END WHILE;
I hope you found ways to get the time delay added. I too recently came across such a situation and this is how I handled it -
DECLARE retry_count INT64;
DECLARE success BOOL;
DECLARE size_bytes INT64;
DECLARE row_count INT64;
DECLARE DELAY_TIME DATETIME;
DECLARE WAIT STRING;
SET retry_count = 1;
SET success = FALSE;
WHILE retry_count <= 3 AND success = FALSE DO
BEGIN
SET row_count = (SELECT row_count FROM [DATASET].__TABLES__ WHERE table_id='[TABLE]');
IF row_count > 0 THEN
SELECT 'Table Exists!' as message, retry_count as retries;
SET success = TRUE;
ELSE
SELECT 'Table does not exist' as message, retry_count as retries, row_count;
SET retry_count = retry_count + 1;
-- WAITFOR DELAY '00:30:00';
SET WAIT = 'TRUE';
SET DELAY_TIME = DATETIME_ADD(CURRENT_DATETIME,INTERVAL 30 MINUTE);
WHILE WAIT = 'TRUE' DO
IF (DELAY_TIME < CURRENT_DATETIME) THEN
SET WAIT = 'FALSE';
END IF;
END WHILE;
END IF;
END;
END WHILE;
Thanks, Anusha
Here is an alternative solution for a specific scenario readers may encounter. We needed a BigQuery sleeping procedure that did not consume slots because we just dropped the reservation in a subsequent statement using DROP RESERVATION.
Other answers either make use of SELECT, or merely DATETIME_ADD, both will consume slots and cause BigQuery to error with "the reservation is not available in the data region" error.
With some experimentation we found that there was a fairly linear relationship between loop iterations and runtime. At the time of writing, for each 75 iterations of a no-op WHILE loop, BigQuery will burn approximately 1 second. The resulting slot-free alternative is:
CREATE OR REPLACE PROCEDURE `project`.`dataset`.`sleep_seconds_approximate`(seconds INT64)
BEGIN
DECLARE i INT64 DEFAULT 0;
DECLARE seconds_to_iterations_ratio INT64 DEFAULT 75;
DECLARE num_iterations INT64 DEFAULT seconds * seconds_to_iterations_ratio;
WHILE i < num_iterations DO
SET i = i + 1;
END WHILE;
END;
You could put this between your DROP RESERVATION AND DROP CAPACITY steps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With