Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Total Sessions in BigQuery vs Google Analytics Reports

I'm just learning BigQuery so this might be a dumb question, but we want to get some statistics there and one of those is the total sessions in a given day.

To do so, I've queried in BQ:

select sum(sessions) as total_sessions from (
  select
    fullvisitorid,
    count(distinct visitid) as sessions,
    from (table_query([40663402], 'timestamp(right(table_id,8)) between timestamp("20150519") and timestamp("20150519")'))
    group each by fullvisitorid
)

(I'm using the table_query because later on we might increase the range of days)

This results in 1,075,137.

But in our Google Analytics Reports, in the "Audience Overview" section, the same day results:

This report is based on 1,026,641 sessions (100% of sessions).

There's always this difference of roughly ~5% despite of the day. So I'm wondering, even though the query is quite simple, is there any mistake we've made?

Is this difference expected to happen? I read through BigQuery's documentation but couldn't find anything on this issue.

Thanks in advance,

like image 565
Willian Fuks Avatar asked May 24 '15 03:05

Willian Fuks


Video Answer


1 Answers

standardsql

Simply SUM(totals.visits) or when using COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING) )) make sure totals.visits=1!

If you use visitId and you are not grouping per day, you will combine midnight-split-sessions!

Here are all scenarios:

SELECT
  COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING) )) allSessionsUniquePerDay,
  COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitId AS STRING) )) allSessionsUniquePerSelectedTimeframe,
  sum(totals.visits) interactiveSessionsUniquePerDay, -- equals GA UI sessions
  COUNT(DISTINCT IF(totals.visits=1, CONCAT(fullVisitorId, CAST(visitId AS STRING)), NULL) ) interactiveSessionsUniquePerSelectedTimeframe,
  SUM(IF(totals.visits=1,0,1)) nonInteractiveSessions
FROM
  `project.dataset.ga_sessions_2017102*`

Wrap up:

  • fullVisitorId + visitId: useful to reconnect midnight-splits
  • fullVisitorId + visitStartTime: useful to take splits into account
  • totals.visits=1 for interaction sessions
  • fullVisitorId + visitStartTime where totals.visits=1: GA UI sessions (in case you need a session id)
  • SUM(totals.visits): simple GA UI sessions
  • fullVisitorId + visitId where totals.visits=1 and GROUP BY date: GA UI sessions with too many chances for errors and misunderstandings
like image 176
Martin Weitzmann Avatar answered Sep 25 '22 09:09

Martin Weitzmann