Creating SSIS Packages
Data movement represents an important part of data management. Data is transported
from client applications to the data server to be stored, and transported back from
the database to the client to be managed and used. In data warehousing, data movement
represents a particularly important element, considering the typical requirements of a data warehouse: the need to import data from one or more operational data stores, the need to cleanse and consolidate the data, and the need to transform data, which allows it to be stored and maintained appropriately in the data warehouse .
Based on the level of complexity, data movement scenarios can be divided into two
groups:
■■ Simple data movements, where data is moved from the source to the destination “as-is” (unmodified)
■■ Complex data movements, where the data needs to be transformed before it can be
stored, and where additional programmatic logic is required to accommodate the
merging of the new and/or modified data, arriving from the source, with existing data, already present at the destination .
What constitutes a complex data movement? Three distinct elements can be observed in
any complex data movement process:
1. The data is extracted from the source (retrieved from the operational data store).
2. The data is transformed (cleansed, converted, reorganized, and restructured) to comply with the destination data model.
3. The data is loaded into the destination data store (such as a data warehouse).
Planning a Simple Data Movement
To determine whether the and Export Wizard is the right tool for a particular data movement, ask yourself a few simple questions:
■■ Will the data need to be transformed before it can be stored at the destination?
If no transformations are required, then the Import and Export Wizard might be the
right tool for the job.
■■ Is it necessary to merge source data with existing data at the destination?
If no data exists at the destination (for example, because the destination itself does not
yet exist), then using the Import and Export Wizard should be the right choice.
Quick Questions :
1. What is the SQL Server Import and Export Wizard?
Answer :
The Import and Export Wizard is a utility that provides a simplified interface
for developing data movement operations where data is extracted from a
source and loaded into a destination, without the need for any transformations.
2. What is the principal difference between simple and complex data movements?
Answer :
In simple data movements, data is copied from one data store into another one
unmodified, whereas in complex data movements, data is modified (transformed)
before being loaded into the destination data store
Learning SSIS : Introducing Control Flow, Data Flow, and Connection Managers
■■ Connection managers Provide connections to data stores, either as data sources or
data destinations. Because the same data store can play the role of the data source as
well as the data destination, connection managers allow the connection to be defined
once and used many times in the same package (or project).
■■ Control flow Defines both the order of operations and the conditions under which
they will be executed. A package can consist of one or more operations, represented
by control flow tasks. Execution order is defined by how individual tasks are connected
to one another. Tasks that do not follow any preceding task as well as tasks that follow
the same preceding task are executed in parallel.
■■ Data flow Encapsulates the data movement components—the ETL:
■ One or more source components, designating the data stores from which the data will be extracted.
■ One or more destination components, designating the data stores into which the
data will be loaded.
■ One or more (optional) transformation components, designating the transformations
through which the data will be passed.
Quick Questions :
1. What is a control flow?
Answer :
In SSIS packages, the control flow defines the tasks used in performing data
management operations; it determines the order in which these tasks are executed
and the conditions of their execution.
2. What is a data flow?
Answer :
In SSIS packages, the data flow is a special control flow task used specifically in
data movement operations and data transformations.
Implementing Control Flow :
SSIS supports a variety of data stores (such as files, [relational] database management systems, SQL Server Analysis Services databases, web servers, FTP servers, mail servers, web services.
ADO connection manager : The ADO connection manager enables connections to ActiveX Data Objects (ADO) and is provided mainly for backward compatibility .
ADO.NET connection manager : The ADO.NET connection manager enables connections to data stores using a Microsoft .NET provider. It is compatible with SQL Server.
Analysis Services connection manager: The Analysis Services connection manager provides access to SSAS databases.
Excel connection manager : As the name suggests, the Excel connection manager provides access to data in Microsoft Excel workbooks.
Flat File connection manager : These connection managers provide access to flat files—delimited or fixed width text files (such as comma-separated values files)
Connection Manager Scope :
SQL Server Database Tools (SSDT) support two connection manager definition techniques,
providing two levels of availability:
■■ Package-scoped connection managers are only available in the context of the SSIS
package in which they were created and cannot be reused by other SSIS packages in
the same SSIS project
■■ Project-scoped connection managers are available to all packages of the project in
which they were created .
Planning a Complex Data Movement :
Typically, the transformation could be any or all of the following:
■■ Data cleansing : Unwanted or invalid pieces of data are discarded or replaced with
valid ones. Many diverse operations fit this description—anything from basic cleanup
(such as string trimming or replacing decimal commas with decimal points) to quite
elaborate parsing (such as extracting meaningful pieces of data by using Regular
Expressions).
■■ Data normalization : In this chapter, we would like to avoid what could grow into a
lengthy debate about what exactly constitutes a scalar value, so the simplest definition
of normalization would be the conversion of complex data types into primitive
data types (for example, extracting individual atomic values from an XML document or
atomic items from a delimited string).
■■ Data type conversion : The source might use a different type system than the
destination. Data type conversion provides type-level translation of individual values
from the source data type to the destination data type (for example, translating a .NET
Byte[] array into a SQL Server VARBINARY(MAX) value).
■■ Data translation : The source might use different domains than the destination.
Translation provides a domain-level replacement of individual values of the source domain with an equivalent value from the destination domain (for example, the character “F” designating a person’s gender at the source is replaced with the string “female” representing the same at the destination).
■■ Data validation : This is the verification and/or application of business rules against
individual values (for example, “a person cannot weigh more than a ton”), tuples (for
example, “exactly two different persons constitute a married couple”), and/or sets (for
example, “exactly one person can be President of the United States at any given time”).’
■■ Data calculation and data aggregation : In data warehousing, specifically, a common requirement is to not only load individual values representing different facts or
measures, but also to load values that have been calculated (or pre-aggregated) from
the original values (for example, “net price” and “tax” exist at the source, but “price
including tax” is expected at the destination).
■■ Data pivoting and data unpivoting: Source data might need to be restructured or
reorganized in order to comply with the destination data model (for example, data in
the entry-attribute-value (EAV) might need to be restructured into columns or vice versa).
Tasks
Data Preparation Tasks : These tasks, shown in Table 4-3, are used to prepare data sources for further processing; the preparation can be as simple as copying the source to the server, or as complex as profiling the data, determining its informational value, or even discovering what it actually is.
File System task: This task provides operations on file system objects (files and folders), such as copying, moving, renaming, deleting objects, creating folders, and setting object attributes.
FTP task : This task provides operations on file system objects on a remote file store via the File Transfer Protocol (FTP), such as receiving, sending, and deleting files, as well
as creating and removing directories.
Web Service task : This task provides access to web services; it invokes web service methods, receives the results, and stores them in an SSIS variable or writes them to a file
connection.
This task provides : XML manipulation against XML files and XML data, such as
validation (against a Document Type Definition or an XML Schema), transformations
(using XSLT), and data retrieval (using XPath expressions). It also supports
more advanced methods, such as merging two XML documents .
Data Profiling task : This task can be used in determining data quality and in data cleansing. It can be useful in the discovery of properties of an unfamiliar data set.
Workflow Tasks :
These tasks, shown in Table , facilitate workflow, which is the structure of the process in
terms of its relationships with the environment and related processes; these tasks automate the interaction between individual SSIS processes and/or the interaction between SSIS processes and external processes (processes that exist outside SQL Server).
Execute Package task : This task executes other SSIS packages, thus allowing the distribution of programmatic logic across multiple SSIS packages, which in turn increases the reusability of individual SSIS packages and enables a more efficient division of labor within the SSIS development team.
Execute Process task : This task executes external processes (that is, processes external to SQL Server). The Execute Process task can be used to start any kind of Windows application.
Message Queue task : This task is used to send and receive messages to and from Microsoft Message Queuing (MSMQ) queues on the local server.
Send Mail task : The task allows the sending of email messages from SSIS packages by using the Simple Mail Transfer Protocol (SMTP).
Expression task : This task is used in the workflow to process variables and/or parameters and to assign the results to other variables used by the SSIS process .
CDC Control task : This task controls the life cycle of SSIS packages that rely on the SQL Server 2012 Change Data Capture (CDC) functionality. It provides CDC information
from the data source to be used in CDC-dependent data flows.
Data Movement Tasks
Bulk Insert task : This task allows the loading of data from formatted text files into a SQL Server database table (or view); the data is loaded unmodified (because transformations are not supported), which means that the loading process is fast and efficient .
Execute SQL task : This task executes SQL statements or stored procedures against a supported data store. The task supports the following data providers: EXCEL, OLE DB, ODBC, ADO, ADO.NET, and SQLMOBILE, so keep this in mind when planning connection managers.
Data flow task : This task is essential to data movements, especially complex data movements, because it provides all the elements of ETL (extract-transform-load); the architecture of the data flow task allows all of the transformations to be performed in flight and in memory, without the need for temporary storage.
SQL Server Administration Tasks
Transfer Database task : Use this task to copy or move a database from one SQL Server instance to another or create a copy of it on the same server. It supports two modes of operation:
■■ In online mode, the database is transferred by using SQL Server Management Objects (SMO), allowing it to remain online for the duration of the transfer.
■■ In offline mode, the database is detached from the source instance, copied to the destination file store, and attached at the destination instance, which takes less time compared to the online mode, but for the entire duration the database is inaccessible.
Transfer Error Messages task : Use this task to transfer user-defined error messages from one SQL Server instance to another; you can transfer all user-defined messages or specify individual ones.
Transfer Jobs task : Use this task to transfer SQL Server Agent Jobs from one SQL Server instance to another; you can transfer all jobs or specify individual ones.
Transfer Logins task : Use this task to transfer SQL Server logins from one SQL Server instance to another; you can transfer all logins, logins mapped to users of one or more
specified databases, or individual users.
SQL Server Maintenance Tasks
Back Up Database task : Use this task in your maintenance plan to automate full, differential, or transaction log backups of one or more system and/or user databases.
Filegroup and file level backups are also supported.
Check Database Integrity task : Use this task in your maintenance plan to automate data and index page integrity checks in one or more system and/or user databases.
Execute SQL Server Agent Job task: Use this task in your maintenance plan to automate the invocation of SQL Server Agent Jobs to be executed as part of the maintenance plan. Execute T-SQL Statement task Use this task in your maintenance plan to execute Transact-SQL scripts as part of the maintenance plan.
You should not confuse the very basic Execute T-SQL Statement Task with the more advanced Execute SQL Task described earlier in this lesson. The Execute T-SQL Statement Task only provides a very basic interface, which will allow you to select the connection manager and specify the statement to execute; parameters, for instance, are not supported in this task.
History Cleanup task : Use this task in your maintenance plan to automate the purging of historical data about backups and restore operations, as well as SQL Server Agent and maintenance plan operations on your SQL Server instance.
Maintenance Cleanup task : Use this task in your maintenance plan to automate the removal of files left over by maintenance plan executions; you can configure the task to remove old backup files or maintenance plan text reports.
Notify Operator task : Use this task in your maintenance plan to send email messages to SQL Server Agent operators.
Rebuild Index task : Use this task in your maintenance plan to automate index rebuilds for one or more databases and one or more objects (tables or indexed views).
Reorganize Index task : Use this task in your maintenance plan to automate index reorganizations for one or more databases and one or more objects (tables or indexed views).
Shrink Database task : Use this task in your maintenance plan to automate database shrink operations.
Update Statistics task : Use this task in your maintenance plan to automate updates of statistics for one or more databases and one or more objects (tables or indexed views).
The Script Task
This special task exposes the SSIS programming model via its .NET Framework implementation to provide extensibility to SSIS solutions. The Script task allows you to integrate custom data management operations with SSIS packages. Customizations can be provided by using any of the programming languages supported by the Microsoft Visual Studio Tools for Applications (VSTA) environment (such as Microsoft Visual C# 2010 or Microsoft Visual Basic 2010).
Typically, the Script task would be used to provide functionality that is not provided by
any of the standard built-in tasks, to integrate external solutions with the SSIS solution, or to provide access to external solutions and services through their application programming interfaces (APIs).
Containers
When real-world concepts are implemented in SSIS, the resulting operations can be composed of one or more tasks. To allow tasks that logically form a single unit to also behave as a single unit SSIS introduces containers. Containers provide structure (for example, tasks that represent the same logical unit can be grouped in a single container, both for improved readability as well as manageability), encapsulation (for example, tasks enclosed in a loop container will be executed repeatedly as a single unit), and scope (for example, container-scoped resources can be accessed by the tasks placed in the same container, but not by tasks placed outside).
For Loop container: This container executes the encapsulated tasks repeatedly, based on an expression—the looping continues while the result of the expression is true; it is based on the same concept as the For loop in most programming
languages.
Foreach Loop container : This container executes the encapsulated tasks repeatedly, per each item of the selected enumerator; it is based on the same iterative concept as the
For-Each loop in most contemporary programming languages.It is suited for executing a set of operations repeatedly based on an enumerable collection of items (such as files in a folder, a set of rows in a table, or an
array of items).
Sequence container ; This container has no programmatic logic other than providing structure to encapsulate tasks that form a logical unit, to provide a scope for SSIS
variables to be accessible exclusively to a specific set of tasks or to provide
a transaction scope to a set of tasks.
There are three precedence constraint types, all of them equivalent in defining sequences but different in defining the conditions of execution:
■■ A success constraint allows the following operation to begin executing when the preceding operation has completed successfully (without errors).
■■ A failure constraint allows the following operation to begin executing only if the preceding operation has completed unsuccessfully (with errors).
■■ A completion constraint allows the following operation to begin executing when the
preceding operation has completed, regardless of whether the execution was successful
or not.