Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Discrepancies on "active users metric" between Firebase Analytics dashboard and BigQuery export

According to Firebase Analytics docs (https://support.google.com/firebase/answer/6317517#active-users), the active number of users is the number of unique users who initiated sessions on a given day. Also according to the docs, every time a session is started an event with session_start name is sent. I am trying to get that metric using BigQuery's export, but my query is giving me different results (15636 on BigQuery, 14908 on FB analytics)

I have also tried converting to different timezones to see if that might be the issue, but no matter which timezone I try I never get the same (or similar) results

Which query should I run to get the same results I get on Firebase Analytics dashboard for active users?

My query is

SELECT EXACT_COUNT_DISTINCT(user_dim.app_info.app_instance_id)
FROM table_date_range([XXXXX.app_events_], timestamp('2016-11-26'), timestamp('2016-11-29')) 
WHERE DATE(event_dim.timestamp_micros) = '2016-11-27' 
AND  event_dim.name ='session_start'  

Thanks

Update

After @djabi's answer I changed my query to use user_engagement rather than session_start and it works much better now. Still some minor differences though (they range from under ten to under 50 out of 16K, depending on the date).

I have tried once again using different timezones by playing around with DATE(date_add(event_dim.timestamp_micros,1,'hour')) but I never got the exact number I get on Firebase Analytics dashboard.

The new numbers are good enough to be considered statistically acceptable, but wondering if anyone has a suggestion to improve the query and get exact results?

The current query is:

SELECT
  COUNT(*) AS active_users
FROM (
  SELECT
    COALESCE(user_dim.user_id, user_dim.app_info.app_instance_id) AS user_id
  FROM
    TABLE_DATE_RANGE([XXXXX.app_events_], TIMESTAMP('2016-11-24'), TIMESTAMP('2016-11-29'))
  WHERE
    DATE(event_dim.timestamp_micros) = '2016-11-25'
    AND event_dim.name ='user_engagement'
  GROUP BY
    user_id )

Note: At the moment we are not sending user_id, so the COALESCE will always return the app_instance_id, in case anyone was going to suggest that could be the problem

like image 614
Javier Ramirez Avatar asked Nov 28 '16 20:11

Javier Ramirez


2 Answers

You need to wait for full 3 days for data from offline devices to be uploaded. Your query correctly filter the events based on the event timestamp and you pull data from 3 days but that is only day and half from today and that is enough for all data to be uploaded. Try including 3 days from yesterday.

Also try using user_engagement event instead of session_start. I believe active user count is based on user_engagement and not on session_start events.

Also FB reports take a bit to process so you wight want and check the FB reports the next day.

FB reports are done on the time zone on the account and events are timestamped in UTC so the day in FB reports is different from UTC calendar day. You want to control for that discrepancy as well to get matching numbers.

like image 81
djabi Avatar answered Sep 22 '22 05:09

djabi


Sessions are by-default measured after user activity of 10 seconds in the respective app which you can change. Try changing the sessions start time count to the least number possible and then you may arrive at a number closer to what you are expecting.

like image 40
Anupam Aacharya Avatar answered Sep 23 '22 05:09

Anupam Aacharya