Disclaimer: I'm in the process of learning SSIS - so forgive me if this is an obvious answer, but I couldn't find anything on searching.
It seems that the Merge transformation is very similar to the Union All transformation, with the following differences:
Apart from these differences, there doesn't seem to be much else that is different. They are both partially blocking, and from my limited tests, I couldn't see any noticeable difference in performance either.
Everywhere I've searched, I've found that the recommendation is to use "Merge" whenever possible, and to use "Union All" if there are more than 2 data sources, or if the source data is unsorted.
My question is, why? Why would I not just use Union All everywhere? What is the point of Merge?
Once again, I'm sorry if this is a very basic/newbie question, but my instructor doesn't know the answer, and searching hasn't been very fruitful either.
Thanks!
Note: It seems (from some other SO answers that people are mixing up "Merge" with "Merge Join". These are two different things.
Ok, so the answer is kind of obvious now in hindsight. It seems that the key difference is in the output.
The idea is that if we have sorted input data then why not keep it sorted after our "Merge"/"Union All"? Who knows, we may need sorted data as an input in another transformation down the line, so just use "Merge" if possible so that the data stays sorted.
What's funny is that a whole lot of articles on the internet don't mention this simple difference. Hope this helps another newbie on their way to SSIS mastery!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With