Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSIS Control Flow vs Data Flow

I don't entirely understand the purpose of control flow in an SSIS package. In all of the packages I've created, I simply add a data flow component to control flow and then the rest of the logic is located within the data flow.

I've seen examples of more complicated control flows (EX: foreach loop container that iterates over lines in an Excel file.), but am looking for an example where it could not also be implemented in the data flow. I could just as easily create a connection to the excel file within the data flow.

I'm trying to get a better understanding of when I would need to (or should) implement logic in control flow vs using the data flow to do it all.

What prompted me to start looking into control flow and it's purpose is that I'd like to refactor SSIS data flows as well as break packages down into smaller packages in order to make it easier to support concurrent development.

I'm trying to wrap my mind around how I might use control flow for these purposes.

like image 973
Paul Avatar asked Dec 15 '10 17:12

Paul


People also ask

What is difference between control flow and data?

Data flow is concerned about where data are routed through a program/system and what transformations are applied during that journey. Control flow is concerned about the possible order of operations.

What does control flow used for SSIS?

It specifies the order in which tasks and containers are executed at run time and the conditions under which tasks and containers run. For example, a precedence constraint can specify that a task must succeed for the next task in the control flow to run.

What are the 3 data flow components of SSIS?

SQL Server Integration Services provides three different types of data flow components: sources, transformations, and destinations. Sources extract data from data stores such as tables and views in relational databases, files, and Analysis Services databases. Transformations modify, summarize, and clean data.

What is difference between control flow and transaction flow?

TRANSACTION FLOW GRAPHS:Transaction flows and transaction flow testing are to the independent system tester what control flows are path testing are to the programmer. The transaction flow graph is to create a behavioral model of the program that leads to functional testing.


3 Answers

A data flow defines a flow of data from a source to a destination. You do not start on one data flow task and move to the next. Data flows between your selected entities (sources, transformations, destinations).

Moreover within a data flow task, you cannot perform tasks such as iteration, component execution, etc.

A control flow defines a workflow of tasks to be executed, often a particular order (assuming your included precedence constraints). The looping example is a good example of a control-flow requirement, but you can also execute standalone SQL Scripts, call into COM interfaces, execute .NET components, or send an email. The control flow task itself may not actually have anything whatsoever to do with a database or a file.

A control flow task is doing nothing in itself TO the data. It is executing some that itself may (or may not) act upon data somewhere. The data flow task IS doing something with data. It defines its movement and transformation.

It should be obvious when to execute control flow logic and data flow logic, as it will be the only way to do it. In your example, you cite the foreach container, and state that you could connect to the spreadsheet in the data flow. Sure, for one spreadsheet, but how would you do it for multiple ones in a folder? In the data flow logic, you simply can't!

Hope this helps.

like image 65
James Wiseman Avatar answered Oct 27 '22 12:10

James Wiseman


Data flow - are for just moving data from one source to another.

Control flow - provide the logic for when data flow components are run and how they are run. Also control flow can: perform looping, call stored procedures, move files, manage error handling, check a condition and call different tasks (incl data flows) depending on the result, process a cube, trigger another process, etc.

If you're moving data from one location to another and it's the same each time, not based on any other condition, then you can get away with a package with just a dataflow task, but in most cases packages are more complex than that.

like image 20
Stagg Avatar answered Oct 27 '22 14:10

Stagg


We use the control flow for many things. First all our data concerning the data import is stored in tables. So we run procs to start the dataflow and end it, so that our logging works correctly, we do looping through a set of files, we move files to archive locations and rename with the date and delete them from processing locations. We have a separate program that does file movement and validates the files for the correct comlumns and size. We run a proc to make sure the file has been validated before going into the dataflow. Sometimes we have a requirement to send an email when a file is processed or send a report of records which could not process. These emails are put into the control flow. Sometimes we have some clean up steps that are more easily accomplished using a stored proc and thus put the step in the control flow.

like image 34
HLGEM Avatar answered Oct 27 '22 14:10

HLGEM