Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are OLAP, OLTP, data warehouses, analytics, analysis and data mining related?

I'm trying to understand what OLAP, OLTP, data mining, analytics etc. are about, and I feel like my understanding about some of these concepts is still a bit vague. Information about these subjects tend to be explained in a very complex manner on the internet.


I feel like a question like this is likely to be closed since it's a very broad one, so I'll try to narrow it down into two questions:

Question 1:

After doing research I understand the following about these concepts, is it correct?

  • Analysis is decomposing something complex, to understand the inner workings better.
  • Analytics is predictive analysis on information that requires alot of math and statistics.
  • There's many type of databases, but they are either OLTP (transactional) or OLAP (analytical).
  • OLTP databases use ER diagrams, and are therefore easier to update because they are in normalized form.
  • In contrast, OLAP uses the denormalized star schema's and is therefore easier to query
  • OLAP is used for predictive analysis and OLTP is usually used in more practical situations since theres no redundancy.
  • Data warehouses is a type of OLAP database, and usually consists out of multiple other databases.
  • Data mining is a tool used in analytics, where u use computer software to find out relationships between data so you can predict things (e.g. customer behavior).

Question 2:

I'm especially confused about the difference between analytics and analysis. They say analytics is multidimensional analysis, but what is that supposed to mean?

like image 833
user1534664 Avatar asked Dec 26 '22 04:12

user1534664


1 Answers

I will try to explain you from the top of the pyramid:

Business Intelligence (what you didn't mentioned) is term in IT which stands for a complex system and gives useful informations about company from data.

So, BI systems has target: Clean, accurate and meaningful informations. Clean means there is no tech problems (missing keys, incomplete data ect). Accurate means accurate - BI systems are also used as fault checker of production database (logical faults - i.e invoice bill is too high, or inactive partner is used ect). It has been accomplished with rules. Meaningful is hard to explain, but in simple english, it's all your data (even excel table from the last meeting), in way you want.

So, BI system has back-end: It's data warehouse. DWH is nothing else than a database (instance, not software). It can be stored in RDBMS, analytical db (columnar or document store types), or NoSQL databases.

Data warehouse is term used usually for whole database that I explained above. There could be number of data-marts (if Kimball model is used) - more often, or relational system in 3rd normalized form (Inmon model) called enterprise data warehouse.

Data marts are tables inside DWH that are related (star schema, snowflake schema). Fact table (business process in denormalized form ) and dimension tables.

Each data mart represents one business process. Example: DWH has 3 data marts. One is retail sales, second is export, and third is import. In retail you can see total sales, qty sold, import price, profit (measures) by SKU, date, store, city ect (dimensions).

Loading data in DWH is called ETL(extract, transform, load).

  1. Extract data from multiple sources (ERP db, CRM db, excel files, web service...)

  2. Transform data (clean data, connect data from diff sources, match keys, mine data)

  3. Load data (Load transformed data in specific data marts)

edit because of comment: ETL process is usually created with ETL tool, or manually with some programming language (python, c# ect) and APIs.

ETL process is group of SQLs, procedures, scripts and rules related and separated in 3 parts (look above), controlled by meta data. It's either scheduled (every night, every few hours) or live (change data capture, triggers, transactions).

OLTP and OLAP are types of data processing. OLTP is used in transaction purpose, between database and software (usually only one way of input/output data). OLAP is for analitical purpose, and this means there is multiple sources, historical data, high select query performance, mined data.

edit because of comment: Data Processing is way how data is stored and accessed from database. So, based on of your needs, database is set in different way.

Image from http://datawarehouse4u.info/:

enter image description here

Data mining is the computational process of discovering patterns in large data sets. Mined data can give you more insight view of business process or even forecast.

Analysis is a verb, which in BI world means simplicity of getting asked information from data. Multidimensional analysis actually says how system is slicing your data (with dimensions inside cube). Wikipedia said that analysis of data is a process of inspecting data with the goal of discovering useful information.

Analytics is a noun and it represent a result of analysis process.

Don't get so much fuss about those two words.

like image 174
fenix Avatar answered Dec 28 '22 20:12

fenix