Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BiqQuery vs Google Analytics, which data is more accurate?

As a Premium Google Analytics/BigQuery customer, our question is, Which data is more accurate?

I tend to want to lean toward BigQuery being more accurate because we can actually see the raw data, but we have no insight into the method Google Analyitcs is using to calculate its numbers.

I also think that a lot of it has to do with SAMPLING.

When you calculate something simple like Total Pageviews for a single page, the Google Analytics numbers line up to BigQuery within .00001%:

sum(case when regexp_match(hits.page.pagepath,r'(?i:/contact.aspx)') and hits.type = "page" then 1 else 0 end) as total_pageviews

When you calculate something more complex like Unique Pageviews for a single page, Google Analytics numbers are 5% greater than BigQuery. Note that it is sampling by the max 1 Million:

count(distinct (case when regexp_match(hits.page.pagepath,r'(?i:/contact.aspx)') and hits.type = "page" then concat(fullvisitorid, string(visitid)) end), 1000000) as unique_pageviews

I would love to know what others think or what the Google Developers themselves can explain.

like image 493
hoggkm Avatar asked Oct 16 '14 17:10

hoggkm


2 Answers

If you are a premium customer I am assuming that's because you have a large website with a lot of data. The Google Analytics API will sample your data if there's too much of it. This is something you can try and prevent by putting the sampling level up. Even with the sampling level set to high precision you will still get sampled data back from the API.

Check the Json coming back from the API, it will tell you if your data is being sampled.

Big Query wont sample your data, there is a way for premium customers to use the API with out sampling data but I think you have to contact Google about setting that up.

The bigger point in Big Queries favor is that you aren't limited to 7 dimensions and 10 metrics like you are with the Google Analytics API.

Note: I am not a Google Developer but I am a Google Developer Expert for Google Analytics.

like image 184
DaImTo Avatar answered Sep 28 '22 16:09

DaImTo


I am a big fan of BigQuery. I have also used Google Analytics quite a lot. So the question is about where the data is more accurate.

Well, the answer to such a question is always: "data is more accurate, the closer it is to where it originates". BigQuery is an underlying storage of all of Google's data. This is where data is collected, indexed, and then made accessible through a SQL interface.

Google Analytics is a tool that was developed with a lot of free accounts in mind. To support free accounts, GA needed to scale well. To scale, companies optimize on storage by pre-aggregating data.

So you are really comparing two things: pre-summarized/pre-aggregated data (GA) and raw accumulated data (BigQuery). Which would you trust?

Now, it sounds like there is also a 2nd question: "how to get accurate aggregates from BigQuery?" BigQuery is full on ANSI incompatible SQL that is hard to remember for ad-hoc queries. You are better off connecting a BI tool on top of BigQuery, so that you can explore data in a consistent manner (i.e. same threshold/rounding).

like image 32
Segah Meer Avatar answered Sep 28 '22 17:09

Segah Meer