Understanding ETL processes

Question

ETL seems to be a pretty common task. I am basically reading some ETL mistakes which designers make with very large data on http://it.toolbox.com/blogs/infosphere/17-mistakes-that-etl-designers-make-with-very-large-data-19264

I need some practical insights for the following points

a) Incorporating Inserts, Updates, and Deletes in to the same data flow / same process.. How is that a problem?

b) Sourcing multiple systems at the same time, depending on heterogeneous systems of data.

c) Not producing the correct indexes on the sources/ lookups that need to be accessed.

d) Believing that ‘ I need to process all the data in one pass because it’s the fastest way to do it ‘

Any help?

user2943601 · Accepted Answer

a) Data integrity issue

b) data quality will increase and less failure for smaller chunks.

c) will take more time to complete<

d) wrong indexes can cause more time. Better have indexes based on the query you are executing. i.e what comes in the where clause of statement

e) splitting the data into smaller data sets and processing the same would be an efficient solution
Your a BITS-PILANI(WILP) student rite.

Understanding ETL processes

Tags:

etl

data-warehouse

Arun Khetarpal

1 Answers

user2943601

Recent Activity

Donate For Us

Understanding ETL processes

Tags:

etl

data-warehouse

Arun Khetarpal

1 Answers

user2943601

Related questions

Recent Activity

Donate For Us