BigQuery - Query time becomes extremely long

Question

Recently all my query takes too long time but basically all of them consume no data.

For example, for a really simple query

Start Time: Jan 14, 2016, 12:35:13 PM
End Time: Jan 14, 2016, 12:35:15 PM
Bytes Processed: 0 B
Bytes Billed: 0 B
Billing Tier: 1
Destination Table: ****************.******************
Write Preference: Append to table
Allow Large Results: true
Flatten Results: true

this is the information I got from the BQ console, which tells me that this query doesn't consume any data(it's true) and only takes two seconds.

But it actually takes 27 seconds when I run this query again in the console by click Run Query in the query history. And after that, the Query History in the console shows this query takes 2 seconds again.

Basically all the query in this dataset have this issue.

I have over 40000 tables in this dataset.

So my guess is that before the BQ actually run the query, it first locates the table that is gonna be used. Then it starts to execute the query, which here is the start time in the query history.

If that is the case, how should I solve it and why does it take so long?

Here is the query I mentioned(have made some changes):

select "some_id", '2015-12-01', if (count(user_id) == 0, NULL, sum(users_in_today_again) / count(user_id)) as retention
from
(
select
  users_in_last_day.user_id as user_id,
  if(users_in_today.user_id is null, 0, 1) as users_in_today_again
FROM
(
select user_id
from
  table_date_range(ds.sessions_some_id_, date_add(timestamp('2015-12-01'), -1, "DAY"), date_add(timestamp('2015-12-01'), -1, "DAY"))
group by user_id
) as users_in_last_day
left join
(
select user_id
from table_date_range(ds.sessions_some_id_, timestamp('2015-12-01'), timestamp('2015-12-01'))
group by user_id
) as users_in_today
on users_in_last_day.user_id = users_in_today.user_id
)

Thanks in advance!

Mikhail Berlyant · Accepted Answer

PART 1

You can check your theory about delay before start time by using Jobs:Get API with the jobid taken from Query History in BQ Console.
As you can see in Job Resources - statistics parameter in addition to startTime and endTime has also has also creationTime

PART 2

Shooting in the dark here, but try below

SELECT "some_id", '2015-12-01', IF (COUNT(user_id) == 0, NULL, SUM(users_in_today_again) / COUNT(user_id)) AS retention
FROM
(
  SELECT
    users_in_last_day.user_id AS user_id,
    IF(users_in_today.user_id IS NULL, 0, 1) AS users_in_today_again
  FROM
  (
    SELECT user_id FROM (
      SELECT user_id, ROW_NUMBER() OVER(PARTITION BY user_id) AS pos
      FROM TABLE_DATE_RANGE(ds.sessions_some_id_, DATE_ADD(TIMESTAMP('2015-12-01'), -1, "DAY"), DATE_ADD(TIMESTAMP('2015-12-01'), -1, "DAY"))
    ) WHERE pos = 1
  ) AS users_in_last_day
  LEFT JOIN
  (
    SELECT user_id FROM (
      SELECT user_id, ROW_NUMBER() OVER(PARTITION BY user_id) AS pos
      FROM TABLE_DATE_RANGE(ds.sessions_some_id_, TIMESTAMP('2015-12-01'), TIMESTAMP('2015-12-01'))
    ) WHERE pos = 1
  ) AS users_in_today
  ON users_in_last_day.user_id = users_in_today.user_id 
)

I know, it might look silly, but explanation stats (based on some dummy data) for this version enter image description here is totally different from same for version in question

My wild guess is that heavy read/compute Stage1/2 in original version can be responsible for the delay in question

Just guess

BigQuery - Query time becomes extremely long

Tags:

google-bigquery

Chris Kong

1 Answers

Mikhail Berlyant

Recent Activity

Donate For Us

BigQuery - Query time becomes extremely long

Tags:

google-bigquery

Chris Kong

1 Answers

Mikhail Berlyant

Related questions

Recent Activity

Donate For Us