Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Architecture for database analytics

We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may want to know how many queries we had in the last month, weekly compared, etc... that is the order of billions entries if not more.

The way it is currently done is quite standard: daily scripts which scan the databases, and generate big CSV files. I don't like this solutions for several reasons:

  • as typical with those kinds of scripts, they fall into the write-once and never-touched-again category
  • tracking things in "real-time" is necessary (we have separate toolset to query the last few hours ATM).
  • this is slow and non-"agile"

Although I have some experience in dealing with huge datasets for scientific usage, I am a complete beginner as far as traditional RDBM go. It seems that using column-oriented database for analytics could be a solution (the analytics don't need most of the data we have in the app database), but I would like to know what other options are available for this kind of issues.

like image 616
David Cournapeau Avatar asked Apr 21 '10 06:04

David Cournapeau


People also ask

What is big data analytics architecture?

A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems. Big data solutions typically involve one or more of the following types of workload: Batch processing of big data sources at rest.

How many types of data architecture are there?

Today, I want to describe 3 parts of data architecture in simple terms; applications, data warehouses, and data lakes.

What is data structure architecture?

Data architecture is a framework for how IT infrastructure supports your data strategy. The goal of any data architecture is to show the company's infrastructure how data is acquired, transported, stored, queried, and secured. A data architecture is the foundation of any data strategy.

What are the typical components of analytics architecture?

Though the toolset is too expansive to describe all of it, it's worth going into a bit more depth on the four major components: the data warehouse, ETL pipelines, transformation tools, and analytics tools.


1 Answers

You will want to google Star Schema. The basic idea is to model a special data warehouse / OLAP instance of your existing OLTP system in a way that is optimized to provided the type of aggregations you describe. This instance will be comprised of facts and dimensions.

In the example below, sales 'facts' are modeled to provide analytics based on customer, store, product, time and other 'dimensions'.

alt text

You will find Microsoft's Adventure Works sample databases instructive, in that they provide both the OLTP and OLAP schemas along with representative data.

like image 69
Ryan Cox Avatar answered Nov 01 '22 00:11

Ryan Cox