Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding ETL processes

ETL seems to be a pretty common task. I am basically reading some ETL mistakes which designers make with very large data on http://it.toolbox.com/blogs/infosphere/17-mistakes-that-etl-designers-make-with-very-large-data-19264

I need some practical insights for the following points

a) Incorporating Inserts, Updates, and Deletes in to the same data flow / same process.. How is that a problem?

b) Sourcing multiple systems at the same time, depending on heterogeneous systems of data.

c) Not producing the correct indexes on the sources/ lookups that need to be accessed.

d) Believing that ‘ I need to process all the data in one pass because it’s the fastest way to do it ‘

Any help?

like image 773
Arun Khetarpal Avatar asked May 12 '26 08:05

Arun Khetarpal


1 Answers

a) Data integrity issue

b) data quality will increase and less failure for smaller chunks.

c) will take more time to complete<

d) wrong indexes can cause more time. Better have indexes based on the query you are executing. i.e what comes in the where clause of statement

e) splitting the data into smaller data sets and processing the same would be an efficient solution
Your a BITS-PILANI(WILP) student rite.

like image 98
user2943601 Avatar answered May 19 '26 07:05

user2943601



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!