Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is better, ETL or ELT? [closed]

Having spent some time working on data warehousing, I have created both ETL (extract transform load) and ELT (extract load transform) processes. It seems that ELT is a newer approach to populating data warehouses that can more easily take advantage of cluster computing resources. I would like to hear what other people think the advantages are of ETL and ELT over each other and when you should use one or the other.

like image 211
Chris J Avatar asked Jun 19 '10 18:06

Chris J


People also ask

What is better ELT or ETL?

ETL is a time-intensive process; data is transformed before loading into a destination system. ELT is faster by comparison; data is loaded directly into a destination system, and transformed in-parallel. Performed on secondary server. Best for compute-intensive transformations & pre-cleansing.

Is ETL outdated?

ETL is outdated. It works with traditional data center infrastructures, which cloud technologies are already replacing. The loading time takes hours, even for businesses with data sets that are just a few terabytes in size. ELT is the future of data warehousing and efficiently utilizes current cloud technologies.

What is the advantage of ELT?

The main advantage of using an ELT approach is that you can move all raw data from a multitude of sources into a single, unified repository (a single source of truth) and have unlimited access to all of your data at any time. You can work more flexibly and it makes it easy to store new, unstructured data.

What is the future of ETL?

Future ETL will be providing a data management framework – comprehensive and hybrid approach for managing big data. ETL solutions will encompass not only data integration but also data governance, data quality, and data security.


2 Answers

Which is better is hard to answer -- depends on the problem.

I prefer multi-step ETL -- ECCD (Extract, Clean, Conform, Deliver) whenever possible. I also keep intermediate csv files after each extract, clean, and conform step; takes some disk space, but is quite useful. Whenever DW has to be re-loaded due to bugs in etl, or DW schema changes, there is no need to query source systems again -- it is already in flat files. It is also quite convenient to be able to grep, sed and awk through flat files in the staging area when needed. In the case when there are several source systems which feed into the same DW, only extract steps have to be developed (and maintained) for each of the source systems -- clean, conform, and deliver steps are all common.

like image 106
Damir Sudarevic Avatar answered Dec 13 '22 23:12

Damir Sudarevic


So after having played thoroughly with both ETL and ELT, I have come to the conclusion that you should avoid ELT at all costs. ETL prepares the data for your warehouse before you actually load it in. ELT however loads the raw data into the warehouse and you transform it in place. That is problematic if you have a busy data warehouse. If there is a reporting query running on a table that you are attempt to update, your query will get blocked. Consequently, it is possible for reporting queries to hold up or block updates.

Now some of you might say reporting queries do not need to block an update and you can set your isolation level to allow for dirty reads. Reporting queries however are not generally executed by software engineers. They are executed by business users so you can't rely on them to set their isolation levels properly. As well, not all reports can tolerate dirty reads.

There are cases where ELT can work however by introducing it to your data warehouse is dangerous and consequently, I recommend for your sanity and for maintainability, avoid it.

like image 29
Chris J Avatar answered Dec 13 '22 21:12

Chris J