Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery - Query time becomes extremely long

Recently all my query takes too long time but basically all of them consume no data.

For example, for a really simple query

Start Time: Jan 14, 2016, 12:35:13 PM
End Time: Jan 14, 2016, 12:35:15 PM
Bytes Processed: 0 B
Bytes Billed: 0 B
Billing Tier: 1
Destination Table: ****************.******************
Write Preference: Append to table
Allow Large Results: true
Flatten Results: true

this is the information I got from the BQ console, which tells me that this query doesn't consume any data(it's true) and only takes two seconds.

But it actually takes 27 seconds when I run this query again in the console by click Run Query in the query history. And after that, the Query History in the console shows this query takes 2 seconds again.

Basically all the query in this dataset have this issue.

I have over 40000 tables in this dataset.

So my guess is that before the BQ actually run the query, it first locates the table that is gonna be used. Then it starts to execute the query, which here is the start time in the query history.

If that is the case, how should I solve it and why does it take so long?

Here is the query I mentioned(have made some changes):

select "some_id", '2015-12-01', if (count(user_id) == 0, NULL, sum(users_in_today_again) / count(user_id)) as retention
from
(
select
  users_in_last_day.user_id as user_id,
  if(users_in_today.user_id is null, 0, 1) as users_in_today_again
FROM
(
select user_id
from
  table_date_range(ds.sessions_some_id_, date_add(timestamp('2015-12-01'), -1, "DAY"), date_add(timestamp('2015-12-01'), -1, "DAY"))
group by user_id
) as users_in_last_day
left join
(
select user_id
from table_date_range(ds.sessions_some_id_, timestamp('2015-12-01'), timestamp('2015-12-01'))
group by user_id
) as users_in_today
on users_in_last_day.user_id = users_in_today.user_id
)

Thanks in advance!

like image 509
Chris Kong Avatar asked Dec 07 '25 23:12

Chris Kong


1 Answers

PART 1

You can check your theory about delay before start time by using Jobs:Get API with the jobid taken from Query History in BQ Console.
As you can see in Job Resources - statistics parameter in addition to startTime and endTime has also has also creationTime

PART 2

Shooting in the dark here, but try below

SELECT "some_id", '2015-12-01', IF (COUNT(user_id) == 0, NULL, SUM(users_in_today_again) / COUNT(user_id)) AS retention
FROM
(
  SELECT
    users_in_last_day.user_id AS user_id,
    IF(users_in_today.user_id IS NULL, 0, 1) AS users_in_today_again
  FROM
  (
    SELECT user_id FROM (
      SELECT user_id, ROW_NUMBER() OVER(PARTITION BY user_id) AS pos
      FROM TABLE_DATE_RANGE(ds.sessions_some_id_, DATE_ADD(TIMESTAMP('2015-12-01'), -1, "DAY"), DATE_ADD(TIMESTAMP('2015-12-01'), -1, "DAY"))
    ) WHERE pos = 1
  ) AS users_in_last_day
  LEFT JOIN
  (
    SELECT user_id FROM (
      SELECT user_id, ROW_NUMBER() OVER(PARTITION BY user_id) AS pos
      FROM TABLE_DATE_RANGE(ds.sessions_some_id_, TIMESTAMP('2015-12-01'), TIMESTAMP('2015-12-01'))
    ) WHERE pos = 1
  ) AS users_in_today
  ON users_in_last_day.user_id = users_in_today.user_id 
)

I know, it might look silly, but explanation stats (based on some dummy data) for this version enter image description here is totally different from same for version in question enter image description here

My wild guess is that heavy read/compute Stage1/2 in original version can be responsible for the delay in question

Just guess

like image 77
Mikhail Berlyant Avatar answered Dec 12 '25 01:12

Mikhail Berlyant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!