Recently all my query takes too long time but basically all of them consume no data.
For example, for a really simple query
Start Time: Jan 14, 2016, 12:35:13 PM
End Time: Jan 14, 2016, 12:35:15 PM
Bytes Processed: 0 B
Bytes Billed: 0 B
Billing Tier: 1
Destination Table: ****************.******************
Write Preference: Append to table
Allow Large Results: true
Flatten Results: true
this is the information I got from the BQ console, which tells me that this query doesn't consume any data(it's true) and only takes two seconds.
But it actually takes 27 seconds when I run this query again in the console by click Run Query in the query history. And after that, the Query History in the console shows this query takes 2 seconds again.
Basically all the query in this dataset have this issue.
I have over 40000 tables in this dataset.
So my guess is that before the BQ actually run the query, it first locates the table that is gonna be used. Then it starts to execute the query, which here is the start time in the query history.
If that is the case, how should I solve it and why does it take so long?
Here is the query I mentioned(have made some changes):
select "some_id", '2015-12-01', if (count(user_id) == 0, NULL, sum(users_in_today_again) / count(user_id)) as retention
from
(
select
users_in_last_day.user_id as user_id,
if(users_in_today.user_id is null, 0, 1) as users_in_today_again
FROM
(
select user_id
from
table_date_range(ds.sessions_some_id_, date_add(timestamp('2015-12-01'), -1, "DAY"), date_add(timestamp('2015-12-01'), -1, "DAY"))
group by user_id
) as users_in_last_day
left join
(
select user_id
from table_date_range(ds.sessions_some_id_, timestamp('2015-12-01'), timestamp('2015-12-01'))
group by user_id
) as users_in_today
on users_in_last_day.user_id = users_in_today.user_id
)
Thanks in advance!
PART 1
You can check your theory about delay before start time by using Jobs:Get API with the jobid taken from Query History in BQ Console.
As you can see in Job Resources - statistics parameter in addition to startTime and endTime has also has also creationTime
PART 2
Shooting in the dark here, but try below
SELECT "some_id", '2015-12-01', IF (COUNT(user_id) == 0, NULL, SUM(users_in_today_again) / COUNT(user_id)) AS retention
FROM
(
SELECT
users_in_last_day.user_id AS user_id,
IF(users_in_today.user_id IS NULL, 0, 1) AS users_in_today_again
FROM
(
SELECT user_id FROM (
SELECT user_id, ROW_NUMBER() OVER(PARTITION BY user_id) AS pos
FROM TABLE_DATE_RANGE(ds.sessions_some_id_, DATE_ADD(TIMESTAMP('2015-12-01'), -1, "DAY"), DATE_ADD(TIMESTAMP('2015-12-01'), -1, "DAY"))
) WHERE pos = 1
) AS users_in_last_day
LEFT JOIN
(
SELECT user_id FROM (
SELECT user_id, ROW_NUMBER() OVER(PARTITION BY user_id) AS pos
FROM TABLE_DATE_RANGE(ds.sessions_some_id_, TIMESTAMP('2015-12-01'), TIMESTAMP('2015-12-01'))
) WHERE pos = 1
) AS users_in_today
ON users_in_last_day.user_id = users_in_today.user_id
)
I know, it might look silly, but explanation stats (based on some dummy data) for this version
is totally different from same for version in question

My wild guess is that heavy read/compute Stage1/2 in original version can be responsible for the delay in question
Just guess
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With