Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Analytics database [closed]

Does anybody know how data in Google Analytics is organized? Difficult selection from large amounts of data they perform very-very fast, what structure of database is it?

like image 289
hippout Avatar asked Jan 21 '10 10:01

hippout


People also ask

Is Google Analytics closing down?

Google Analytics 360 properties will receive a one-time processing extension ending on October 1, 2023.

How long does Google Analytics keep data for?

User and event data retention The maximum amount of time that Analytics will retain Google-signals data is 26 months, regardless of your settings. By default, Google signed-in data expires after 26 months.

Which database is used by Google Analytics?

Bigtable is used by more than sixty Google products and projects, including Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth.

Is Google Analytics a database?

Database designer and developer, financial analyst. In order to optimize the value of any website, information about the number of site visitors and their user behaviors is needed.


1 Answers

AFAIK Google Analytics is derived from Urchin. As it has been said it is possible that since now Analytics is part of the Google family it is using MapReduce/BigTable. I can assume that Google had integrated the old format of Urchin DB with the new BigTable/MapReduce.

I found this links which talk about Urchin DB. Probably some of the things are still in use at the moment.

http://www.advanced-web-metrics.com/blog/2007/10/16/what-is-urchin/

this says:

[snip] ...still use a proprietary database to store reporting data, which makes ad-hoc queries a bit more limited, since you have to use Urchin-developed tools rather than the more flexible SQL tools.

http://www.urchinexperts.com/software/faq/#ques45

What type of database does Urchin use?

Urchin uses a proprietary flat file database for report data storage. The high-performance database architecture handles very high traffic sites efficiently. Some of the benefits of the data base architecture include:

* Small database footprint approximately 5-10% of raw logfile size
* Small number of database files required per profile (9 per month of historical reporting)
* Support for parallel processing of load-balanced webserver logs for increased performance
* Databases are standard files that are easy to back up and restore using native operating system utilitiesv 

More info about Urchin

http://www.google.com/support/urchin45/bin/answer.py?answer=28737

Long time ago I used to have a tracker and on their site they were discussing about data normalization: http://www.2enetworx.com/dev/articles/statisticus5.asp

There you can find a bit of info of how to reduce the data in DB and maybe it is a good start in research.

like image 160
dawez Avatar answered Oct 05 '22 02:10

dawez