Architecture for database analytics

Tags:

We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may want to know how many queries we had in the last month, weekly compared, etc... that is the order of billions entries if not more.

The way it is currently done is quite standard: daily scripts which scan the databases, and generate big CSV files. I don't like this solutions for several reasons:

as typical with those kinds of scripts, they fall into the write-once and never-touched-again category
tracking things in "real-time" is necessary (we have separate toolset to query the last few hours ATM).
this is slow and non-"agile"

Although I have some experience in dealing with huge datasets for scientific usage, I am a complete beginner as far as traditional RDBM go. It seems that using column-oriented database for analytics could be a solution (the analytics don't need most of the data we have in the app database), but I would like to know what other options are available for this kind of issues.

616

asked Apr 21 '10 06:04

David Cournapeau

1 Answers

You will want to google Star Schema. The basic idea is to model a special data warehouse / OLAP instance of your existing OLTP system in a way that is optimized to provided the type of aggregations you describe. This instance will be comprised of facts and dimensions.

In the example below, sales 'facts' are modeled to provide analytics based on customer, store, product, time and other 'dimensions'.

alt text

You will find Microsoft's Adventure Works sample databases instructive, in that they provide both the OLTP and OLAP schemas along with representative data.

answered Nov 01 '22 00:11

Ryan Cox

Related questions
                            
                                How to delete columns of a series in InfluxDB
                            
                                What's the best practice to ensure atomic read of a database table using JDBC?
                            
                                Struggling With OOP Concept
                            
                                How to call a Stored Procedure from Hibernate having both IN and OUT parameters
                            
                                Is it a good practice to save a base64 string on the Database?
                            
                                Tips on refactoring an outdated database schema [closed]
                            
                                What is the best way to do incremental backups in MySQL?
                            
                                Does a version control database storage engine exist?
                            
                                Why is it bad to use boolean flags in databases? And what should be used instead?
                            
                                "Follow user" database table design
                            
                                Where to put the database sensitive information [duplicate]
                            
                                Tracking user interaction on a website
                            
                                I need an advice about NoSQL/MongoDb and data/models structure
                            
                                How can I use SqlBulkCopy with binary data (byte[]) in a DataTable?
                            
                                "Create table if not exists" - how to check the schema, too?
                            
                                Database Design: how to model generic price factors of a product/service?
                            
                                Is there a good port of leveldb for C#? [closed]
                            
                                Where is the default database created for C# MVC ASP.NET application?
                            
                                testing inequality with columns that can be null
                            
                                Ordering columns in database tables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Architecture for database analytics

Tags:

database

scalability

data-mining

analytics

David Cournapeau

People also ask

1 Answers

Ryan Cox

Recent Activity

Donate For Us