Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

big difference in "visitor" count

I try to pull out the (unique) visitor count for a certain directory using three different methods: * with a profile * using an dynamic advanced segment * using custom report filter

On a smaller site the three methods give the same result. But on the large site (> 5M visits/month) I get a big discrepancy between the profile on one hand and the advanced segment and filter on the other. This might be because of sampling - but the difference is smaller when it comes to pageviews. Is the estimation of visitors worse and the discrepancy bigger when using sampled data? Also when extracting data from the API (using filters or profiles) I still get DIFFERENT data even if GA doesn't indicate that the data is sampled - ie I'm looking at unsampled data.

Another strange thing is that the pageviews are higher in the profile than the filter, while the visitor count is higher for the filter vs the profile. I also applied a filter at the profile to force it to use sample data - and I again get quite similar results to the filter and segment-data.

           profile  filter  segment  filter@profile
unique     25550    37778   36433    37971 
pageviews  202761   184130  n/a      202761

What I am trying to achieve is to find a way to get somewhat accurat data on unique visitors when I've run out of profiles to use.

More data with discrepancies can be found in this google docs: https://docs.google.com/spreadsheet/ccc?key=0Aqzq0UJQNY0XdG1DRFpaeWJveWhhdXZRemRlZ3pFb0E

like image 990
Hampus Brynolf Avatar asked Feb 18 '26 07:02

Hampus Brynolf


1 Answers

Google Analytics (free version) tracks only 10 mio page interactions [0] (pageviews and events, any tracker method that start with "track" is an interaction) per month [1], so presumably the data for your larger site is already heavily sampled (I guess each of you 5 Million visitors has more than two interactions) [2]. Ad hoc reports use only 1 mio datapoints at max, so you have a sample of a sample. Naturally aggregated values suffer more from smaller sample sizes.

And I'm pretty sure the data limits apply to api access too (Google says that there is "no assurance that the excess hits will be processed"), so for the large site the api returns sampled (or incomplete) data, too - so you cannot really be looking at unsampled data.

As for the differences, I'd say that different ad hoc report use different samples so you end up with different results. With GA you shouldn't rely too much an absolute numbers anyway and look more for general trends.

[1] Analytics Premium tracks 50 mio interactions per month (and has support from Google) but comes at 150 000 USD per year

[2] Google suggests to use "_setSampleRate()" on large sites to make sure you have actually sampled data for each day of the month instead of random hit or miss after you exceed the data limits.


Data limits:

http://support.google.com/analytics/bin/answer.py?hl=en&answer=1070983).

setSampleRate:

https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiBasicConfiguration#_gat.GA_Tracker_._setSampleRate

like image 76
Eike Pierstorff Avatar answered Feb 21 '26 13:02

Eike Pierstorff



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!