Monday, August 1, 2011

SSIS Questions - Part 1

Let's start a series of Q&A's on SSIS also. As these days its essential to learn SSIS or any other BI tool.
I felt SSIS is one of the best tools to start with. Let's see some of the basic questions asked on SSIS.

Q. What is the control flow in SSIS? 
As we have MAIN in .Net, same way we have Control Flow in SSIS. 
Containers, Data flow tasks, Administration tasks, Precedence constraints, and Variables are elements of the control flow. 
The control flow is the highest-level of control process. It allows to orchestrate and manage the run-time process activities of data flow and other processes within a package. We can design a control flow by using an Execute Package task to manage the sequence of processing for a set of existing packages in a Master Package concept. This capability allows to combine individual packages into a highly manageable workflow process. Use precedence constraints to set the process rules and to specify sequence within the control flow. An SSIS package consists of a control flow and one or more objects. Data flow and event handler process control components are optional.

Following are some key points of Control Flow:

  • Process is the key: precedence constraints control the project flow based on task completion, success or failure.
  • Task 1 needs to complete before task 2 can begin.
  • Smallest unit of the control flow is a task.
  • Control flow does not move data from task to task.
  • Tasks are run in series if connected with precedence or in parallel.
  • Package control flow is made up of containers and tasks connected with precedence constraints to control package 

Q. what is a data flow in SSIS?


The Data Flow task encapsulates the data flow engine that moves data between sources and destinations, and lets the user transform, clean, and modify data as it is moved. Addition of a Data Flow task to a package, control flow makes it possible for the package to extract, transform, and load data.

A data flow consists of at least one data flow component, but it is typically a set of connected data flow components: sources that extract data; transformations that modify, route, or summarize data; and destinations that load data. Components are connected in the data flow by paths. Each path specifies the two components that are the start and the end of the path.

At run time, the Data Flow task builds an execution plan from the data flow, and the data flow engine executes the plan. You can create a Data Flow task that has no data flow, but the task executes only if it includes at least one data flow.

A package can include multiple Data Flow tasks, and complex packages frequently do. For example, if a package requires that data flows be run in a specified sequence, or that other tasks be performed between the data flows, you must use a separate Data Flow task for each data flow.

The Data Flow task also manages error flows. At run time, row-level errors may occur when data flow components convert data, perform a lookup, or evaluate expressions. For example, a data column with a string value cannot be converted to an integer, or an expression tries to divide by zero. Both operations cause errors, and the rows that contain the errors can be processed separately using an error flow.

There are many types of transformations as the data moves from the data source to the description. Here are some of the transformations:

Copy Column - This transformation copies the data in a column and creates a new column with it.
Script Component - This uses a VB script to transform the data somehow. You can use programmatic means to access the data, and set this script up as a data source, destination, or a transformation. When you open up the script component, there is a button stated as Design Script, which invokes a Visual Studio Editor where you can code the script with intellisense support.
Sort - This transformation sorts the data based on one or more columns setup to sort in a specified order.
Pivot and Unpivot - New to SQL Server 2005 is the way to pivot/unpivot data rather easily and they are supported as transformations. Pivoting data means you can make the columns of the result set based on a distinct result from the data.
Merge/Union All – Merging allows you to veer two inputs into one output. You can specify the input/output parameters that the transformation will map to. The Merge transformation is a little more restrictive than the Union All transformation.
Conditional Split – Conditional Split works the opposite way; based on values within the data, you can setup statements to split the data if a condition matches. For instance, you can split one result based on the expression “ListPrice > 100” and any data that matches that result is returned via a specific data flow path.
OLE DB Command – This transformation executes a SQL statement for every row in the input source.
Lookup – This transformation looks up the value of a field in a lookup data source/table.
Derived Column - A way to add an additional column in the dataset.
Aggregate - This allow to group data.
Fuzzy Logic - This allows to perform text manpulation like upper case, lower case operations on string data.

No comments: