Author: Ahmed Mujtaba

What is an object?

An object is an Entity, for example: car, house , person, time etc. Object can be tangible or intangible.

Lets consider a car (tangible) which is our object . An object has an attribute, behaviour and has a unique Id.

Car has attribute (color, model) , behaviour ( accelerate, brake the car, change gear) and has a unique registration number.

Now , lets take an intangible object example: Time. A time can have attributes like: Year, month, day. A time can have its behaviour set, like set the Year, set the Month, set the day and it can have a unique identifier : Date of birth, date of joining the college, date of creation.

  1. Information hiding:

Some characteristics of object are:

Information is stored within an object.

It is hidden from outside world.

It can only be manipulated by the object itself.

Example: An object called Alex who is a person and he is our object model, the outside world or any other person cannot know his name by seeing him or interacting with him. We cannot also know his exact age until he lets us know. We can get his name and age only when he wants us to let us know. So, we cannot access his name and age directly but we can ask him to tell his name.

Another example is contact numbers stored on a mobile phone. A mobile phone is an object. It is hiding phone numbers inside it. If i want to get a number starting from letter ‘S’ then i have to ask or access the phone to get me the list of numbers starting from ‘S’.

So information hiding simplifies the process. An object Alex enquires from object phone to tell him the numbers starting from ‘S’. So in short, Alex’s information like age or name is hidden until he tells you or he gives access to his details and you cannot access the phone numbers until you get access to the phone itself and perform the operations.

2. Encapsulation:

State and behaviour of an object are tightly coupled. You can perceive an object whose name is Alex and age is 36. You do not know how he has stored that information in his brain, when you ask him , he performs that operation and then lets you know his name and age. You do not know in which part of the brain he has that information and how he got it.

A phone has stored a phone number in digital format and knows how to convert in human readable format. A user does know how data is stored and how it is converted to human readable format.

3. Object has an interface:

An object encapsulates data and behaviour. So how objects interact with each other ? Each object provides an interface (operations).

So lets say , an object whose name is Alex. You ask Alex in english , what is your name ? So he will tell you his name. If you ask him any other thing like where are you going or what is your plan , you will never get his name. So in simple terms the only interface which Alex has allowed to get exposed to you or outer world is by asking what is your name. Any other way you try to get his name , you will not get it.

Another example can be: an object car , whose interface can be : Steer wheels, change gear, accelerate, apply brakes.

If you want to speed up the car you have to accelerate. If you want to change the direction of the car you will use its interface steer wheels, if you have to stop the car you have to apply the brakes interface, you cannot stop the car if you try to use any other interface!

Abstraction:

Abstraction is a way to cope with complexity and reduce the complexity .

Principle of abstraction: Capture those details about an object that are relevant to that perspective.

Lets take another example : A car with a driver’s perspective and engineer’s perspective.

Driver’s perspective: A car has 4 doors, white color, a steering wheel, good seats.

Engineer’s perspective: Structure of car, engine, number of rods, wheel type etc .

So each perspective is separate and making them separate will reduce complexity.

Classes:

In OOP Model: Some of the Objects exhibit identical characteristics ( information structure and behaviour).

We say they belong to the same class.

For example: There are 15 students in a class room. They exhibit almost same behaviour, they take notes, sit on a chair and give the exam. So we can instantiate each student as an object with identical characteristics.

Take another example of a wooden stamp which has a government seal. When you stamp it , it creates a different object but the characteristic is same, the government seal.

So, take an example: A student class , there are three students , Alex who studies maths, Katherine who studies chemistry and Jack studies Physics. We created three objects here as an instance of student class.

Inheritance:

A child inherits characteristics of his parents.

Besides that a child may also have its own unique characteristics.

If class B inherits from class A, then class B will have all characteristics (attributes and behaviour) of class A . A parent class is called Base class and child class is called derived class.

Consider a Parent class Person having its child classes student, teacher and doctor.

A parent class shape having child classes circle, triangle and line.

A person parent class has (name,age, eat( ), walk( ) ) has a derived class student (class, year, study( ), give exam( ) ) and a class teacher (employeeId, designation, teach( ), take exam( ) ).

  1. Generalization in Inheritance:

The derived class inherit general characteristics.

A teacher and a student both eat and walk. So we can put these characteristics in a base class called person.

Person(name,age, gender, walk( ), eat( ) ) .

Student (program,year, give exam( ), study( ) ) and Teacher( designation, employee Id, take exam( ), teach( ) ).

Student is a person. Teacher is a person.

2. Subtyping( Extension) :

Sub-typing means that derived class is behaviourally compatible with base class. Behaviourally compatible means that base class can be replaced by derived class.

3. Speciaization (Restriction):

Specialization means that derived class is behaviourally incompatible with base class.And it means that base class cannot always be replaced by derived class.

Suppose a base class called Person has a derived class Adult. Person <– Adult , now if we add a restriction in adult class that age < 18 then it cannot be an adult , so Person class then cannot be replaced by Adult class.

Another example is base class called Integer Set (add(element) ) which has derived class called Natural Set( add(element) ).

Integer Set (add element( ) ) <– Natural Set (add element( ) ).

Consider an integer Set class which includes negative, zero and positive numbers. Now lets look at Natural set ,which can only include 0,1,2,3 and so on.. It cannot include negative numbers. So a Natural Set class is a specialized class of Integer Set class.

4. Overriding:

A derived class may override a default behaviour from its parent class but may exhibit totally new behaviour or new implementation. This is called overriding.

The reasons for overriding can be:

Provide a new behaviour specific to the derived class

Restricting default behaviour.

Extend the default behaviour.

Improve performance.

For example : In Integer Set class we use add(element) to add two variables but in Natural Set we can use to add 5 elements.

Overriding concept is under inheritance in OO modeling not outside it.

Abstract Classes

An abstract class implements abstract concept.

Main purpose is to be inherited by other classes.

Cannot be instantiated.

Promotes re use.

In the person and student,teacher , doctor classes example, person class can be an abstract class. Firstly the abstract class are mostly at top of the hierarchy so it can be inherited. A person class can be an abstract class because in real life for example : we say he is a doctor , he is a student , she is a teacher, not he is a person, she is a person.

Take another example of a shape class which has a derived class called circle, line, triangle. We do not say it is a shape when we see a circle or a triangle in real life. We say it is a circle or a triangle.

Another example: class vehicle(color,model,year, accelerate( ), brake( ) ) , car(color, model, year, accelerate( ), brake( )), truck(color, model, year, accelerate( ), brake( )).

That is why we keep abstract class at the top because it exhibits general , common concept like : shape, person, vehicle.

Rather than the classes from which we can create objects from are called concrete classes. Like teacher , doctor, student from Person class . Car, truck , cycle from vehicle class. So we instantiate object from a concrete class .

Multiple Inheritance:

A class inherited from multiple classes is said to have multiple inheritance. For example : If i have to create a mermaid i have to inherit properties of a human and a fish. So in OO model a mermaid class will be made from human class and a fish class.

Association:

Alex drives a car. It can have 1….* association. Alex can drive his car or any other car.

Association can be two way, if Alex is friend of Katty then Katty is also friend of Alex.And this is 1..1 is a two way association.

There can be ternary association as well. For example: Student, course and teacher. Student * .. 1 teacher, Course *…1 teacher. One teacher can teach multiple courses to multiple students.

Composition:

An object may be composed of smaller objects.

The relationship between part objects and whole object is known as composition.

Composition is represented by a filled diamond towards the composed object. It is a strong relationship.

For example: Alex is composed of parts called: 2 legs, 2 arms, 1 head and 1 body.

Aggregation:

An object may contain collection(aggregate) of other objects.

The relationship between container and the contained object is called aggregation. And it is represented by an unfilled diamond towards the container object.

An room contains bed, a cupboard and chairs. But it is not formed or made with these objects. It contains it.

A garden contains many plants. It contains them.

Aggregation is a weaker relationship as it contains the items but not formed from it. Why we say it a weak relationship is because if we remove the room , we can still use the chair and the cupboard. Or we remove these items from the room we can still use the room.

Polymorphism

Polymorphism refers to existence of different forms of same entity.

Diamond and coal are different forms of Carbon.

It is a powerful tool to develop flexible and resusable system.

For example: you have open an editor and there is a print option , whatever file is open either a pdf file, png file or word file, it will print depending on the object.

We can add or create new objects depending on the requirement.

Lets take another example: We have shape class , and its derived class triangle, circle , line . Now we want to add a new derived class square. We can use the same draw( ) method . System will use the draw( ) depending on the object.

Lets work with examples now with Python:

Creating Object:

Instantiation:

The process of creating a new object of a class is called instantiation.

Consider a class as a blueprint and the object instance as a physical object.

When you create an object instance it allocates a space for that object in the memory.

It then calls its constructor function which is a special function with __init__( ) as a syntax in python, it initializes an initial state for the newly created object. Think of a constructor that it sets an initial stage. If we do not initialize or set attributes in constructor then we have to manually write that attributes value every time.

For example:

# without constructor
class Robot:
pass
robot = Robot()
robot.name="Optimus"
robot.battery = 100
# with constructor
class Robot:
def__init__(self,name,battery=100):
self.name= name
self.battery = battery
robot = Robot("optimus")

Many of Python’s built-in classes support what is known as a literal form for designating new instances. For example, the command temperature = 98.6 results in the creation of a new instance of the float class; the term 98.6 in that expression is a literal form.

From a programmer’s perspective, yet another way to indirectly create a new instance of a class is to call a function that creates and returns such an instance. For example, Python has a built-in function named sorted that takes a sequence of comparable elements as a parameter and returns a new instance of the list class containing those elements in sorted order.

Functions:

When using a method of a class, it is important to understand its behavior. Some methods return information about the state of an object, but do not change that state. These are known as accessors. Other methods, such as the sort method of the list class, do change the state of an object. These methods are known as mutators or update methods.

Python’s Built-In Classes

Table provides a summary of commonly used, built-in classes in Python; we take particular note of which classes are mutable and which are immutable. A class is immutable if each object of that class has a fixed value upon instantiation that cannot subsequently be changed. For example, the float class is immutable. Once an instance has been created, its value cannot be changed (although an identifier referencing that object can be reassigned to a different value).

ClassDescriptionImmutable?
boolBoolean valueImmutable
intinteger (arbitrary magnitude)Immutable
floatfloating-point numberImmutable
listmutable sequence of objectsMutable
tupleimmutable sequence of objectsImmutable
strcharacter stringImmutable
setunordered set of distinct objectsMutable
frozensetimmutable form of set classImmutable
dictassociative mapping (aka dictionary)Mutable

Common Linux Commands Part 1:

ls – list current directory contents

pwd – prints the present working directory

cd – change directory

file – print the file type

less – prints the contents within a file

cp – copy files or folder

mkdir – make a directory

mv – rename or move files

rm – remove file or directory

cat – displays the contents of the file

wc – displays the number of lines, words and bytes in a file.

head – displays the first few lines of the file

tail – displays the last few lines of the file

clear – clears the screen

history – displays the content of the previously executed commands on the shell from history list

chgrp – change the ownership of the file’s group

passwd – change user’s password

chmod – change the permission mode of the file

chown – change file owner / group

ps – displays details of all the current processes

jobs – displays a list of all the active jobs

ping – send echo request to the network hosts

traceroute – displays the route packets trace to network host

netstat – displays the network connections, routing tables, interface stats etc

wget – network downloader

ssh – the openSSH ssh client to connect to remote host

locate – find files by name

find – search for files in directories

gzip – compress or expand files

tar – archiving utility

zip – package and compress file

grep – searches for the text in a file according to the regular expression given

sort – sort lines of text files

uniq – omits the repeated lines

cut – removes the section from each line of the file

diff – examine the changes or differences in the files

How to get details of SQL Server agent Jobs from Queries

All the SQL Server Agent Jobs details are logged into the MSDB Database tables .

These tables contains information for each SQL server Agent job like : JobID, sessionID, RunDate , Enabled Status, Owner, Job Steps, Last Run outcomes etc . Tables which are commonly used are :

select * from [dbo].[sysjobactivity]
select * from [dbo].[sysjobhistory]
select * from [dbo].[sysjobs]
select * from [dbo].[sysjobschedules]
select * from [dbo].[sysjobservers]
select * from [dbo].[sysjobsteps]
select * from [dbo].[sysjobstepslogs]

Msg 8120, Level 16, State 1, Line 1 Column ” is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

We will be using the table ‘[HR].[Employees]’ from TSQL2012 database.

The error as described in the Title of the post :

"Msg 8120, Level 16, State 1, Line 1 Column 'HR.Employees.lastname' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."

The error occured when I executed the following query :

select min(empid) as minimum,lastname from [HR].[Employees] 

The reason for the error is : We are missing the group by clause at the end .When we are using an aggregate function along with other columns in the select clause we must use the group by clause .

select min(empid) as minimum,lastname from [HR].[Employees] group by lastname

If you are using more than one column in the select clause while using the aggregate function , then it is necessary to include all the columns in the group by clause except the column on which the aggregate function is applied. For instance : if you use execute the below query , it will throw the same error:

select min(empid) as minimum,lastname,firstname from [HR].[Employees] group by lastname

So , we must include the firstname column as well to successfully execute it as below :

select min(empid) as minimum,lastname,firstname from [HR].[Employees] group by lastname,firstname

Msg 8152, Level 16, State 4, Line 1 String or binary data would be truncated.

This Error is very common when we insert or update data.

This error occurs when we try to insert data which is greater than the length of the column in which we are inserting the data.

For Instance : Lets see an example in our TSQL2012 database :

insert into [Production].Categories(categoryname,description) values ('No chocolate items','no chocolate')

It gave me an error :

Msg 8152, Level 16, State 4, Line 1
String or binary data would be truncated.
The statement has been terminated.

This happened because the Column Categoryname has the column type : nvarchar(15) , which means nvarchar can take upto 15 characters maximum. And the total length of characters tried to insert: ‘No chocolate items’ is of length 18 .

To solve this problem ,we can increase the length of column size to 18 . Or , we can reduce the characters accordingly to fit the size of 15 .

MS SQL Script for creating the database TSQL2012 ,which we will use.

We will be working on the script attached during our posts . You can copy and paste the script in Microsoft SQL 2012 as we will be working on the tables , views , stored procedures , functions and other objects in the script.

Copy , paste and execute the whole script in a new MS SQL 2012 management Studio on the New Query tab to create the TSQL2012 database.

Attached is the script below in the TSQL2012.sql file .

ETL(Extract , Transform and Load) – Part 1 -Introduction to Data warehouse

The analyses of data from relational databases (LOB) applications is not an easy . The normalized relational schema used for an LOB application can consist of hundreds of tables. It is difficult to discover where the data you need for a report is located.
In addition, LOB applications do not track data over time(they perform many DML operations), though many analyses depend on historical data.

A Data warehouse is a centralized data database for an enterprise that contains merged, cleansed, and historical data. The DW data is more better for reporting purposes . The logical design of schema of DW is called Star schema and snowflake schema which consists of Dimension and fact Tables.

The data in DW comes from LOB / relational database . The data which comes in DW is more cleansed ,transformed . OF course , while coming into DW , the data must have gone through some changes so we can get the refreshed data in DW . One way to refresh or let the new data come into DW is through the night job schedule . The report for newly or refreshed data can then be read . 

The physical design for the  DW is simpler than the relational database design as it contains less tables joins .

————————————————————————————————————————————-

CHAPTER 1 : Introduction to Star and Snowflake Schemas

First learning why Reporting is a Problem with a Normalized Schema : 

Lets take AdventureWorks2012 sample database . Now The report should include the sales amount for Internet sales in different countries over multiple years. This will end up with almost 10 tables. The AdventureWorks2012 database schema is very normalized; it’s intended as an example schema to support LOB applications.

The goal of normalization is to have a complete and non-redundant schema.
Every piece of information must be stored exactly once. This way, you can enforce data integrity.

So, a query that joins 10 to 12 tables, as would be required in reporting sales by countries
and years, would not be very fast and can cause performance issues as it reads huge amounts of data sales over multiple years and thus would interfere with the regular transactional work of insertion and updation of the data.

Another problem is in many cases, LOB databases are purged otypically after each new fiscal year starts. Even if you have all of the historical data for the sales transactions, you might have a problem showing the historical data correctly. For example, you might have only the latest customer address, which might prevent you from calculating historical sales by country correctly .

The AdventureWorks2012 sample database stores all data in a single database. However, in an enterprise, you might have multiple LOB applications, each of which might store data in its own database. You might also have part of the sales data in one database and part in another.
And you could have customer data in both databases, without a common identification. In such cases, you face the problems of how to merge all this data and how to identify which customer from one database is actually the same as a customer from another database.

Finally, data quality can be low. The old rule, “garbage in garbage out,” applies to analyses as well. Parts of the data could be missing; other parts could be wrong. Even with good data, you could still have different representations of the same data in different databases. For example, gender in one database could be represented with the letters F and M, and in another database with the numbers 1 and 2. So we can put a proper standard in our data warehouse to this problem.

Star Schema

In Figure below, you can easily identify how the Star schema as it resembles
a star. There is a single central table, called a fact table, surrounded by multiple tables called dimensions. One Star schema covers a particular business area. In this case, the schema covers Internet sales. An enterprise data warehouse covers multiple business areas and consists of multiple Star and Snowflake schemas.

star

The fact table is connected to all the dimensions with foreign keys. Usually, all foreign keys taken together uniquely identify each row in the fact table, and thus they all together form a unique key, so you can use all the foreign keys as a composite primary key of the fact table. The fact table is on the “many” side of its relationships with the dimensions. If you were to form a proposition from a row in a fact table, you might express it with a sentence such as,  “Customer CC purchased product BB on date DD in quantity QQ for amount SS.” . 

As we  know, a data warehouse consists of multiple Star schemas. From a business perspective, these Star schemas are connected. For example, you have the same customers in sales as in accounting. You deal with many of the same products in sales, inventory, and production. Of course, your business is performed at the same time over all the different business areas. To represent the business correctly, you must be able to connect the multiple Star schemas in your data warehouse. The connection is simple – you use the same dimensions for each Star schema. In fact, the dimensions should be shared among multiple Star schemas. Dimensions have foreign key relationships with multiple fact tables. Dimensions which have connections to multiple fact tables are called shared or conformed dimensions.

shared dim

Snowflake Schema

You can imagine multiple dimensions designed in a similar normalized way, with a central fact table connected by foreign keys to dimension tables, which are connected with foreign keys to lookup tables, which are connected with foreign keys to their second-level lookup tables.

In this configuration, a star starts to resemble a snowflake. Therefore, a Star schema with normalized dimensions is called a Snowflake schema.

shared dim

In most long-term projects, you should design Star schemas. Because the Star schema is
simpler than a Snowflake schema, it is also easier to maintain. Queries on a Star schema are simpler and faster than queries on a Snowflake schema, because they involve fewer joins.

Hybrid Schema

In some cases, you can also employ a hybrid approach, using a Snowflake schema only for the first level of a dimension lookup table. In this type of approach, there are no additional levels of lookup tables; the first-level lookup table is denormalized. Figure 1-6 shows such a partially denormalized schema.

Quick Question : 

How do we connect multiple star schemas ?

Answer : 

Through shared dimensions .

Granularity Level

The number of dimensions connected with a fact table defines the granularity level.

Auditing and Lineage

A data warehouse may also contain auditing tables. For every update, you should audit who and when the update was done and by whom and how many rows were affected to each dimension and fact table in your DW. If you also audit how much time was needed for each load, you can calculate the performance and take action if it slows down. You store this information in an auditing table .

You might also need to know where each row in a dimension and/or fact table came from and when it was added. In such cases, you must add appropriate columns to the dimension and fact tables. Such fine detailed auditing information is also called lineage.

ETL(Extract , Transform and Load) – Part 2 – Designing Dimensions

Dimensions give context to measures. Typical analysis includes pivot tables and pivot graphs. These pivot on one or more dimension columns used for analysis—these columns are called attributes in DW and OLAP terminology.

Columns with unique values identify rows. These columns are keys. In a data warehouse,
you need keys just like you need them in an LOB database. Keys uniquely identify entities. Therefore, keys are the second type of columns in a dimension.

Pivoting makes no sense if an attribute’s values are continuous, or if an attribute has too
many distinct values. Imagine how a pivot table would look if it had 1,000 columns, or how a pivot graph would look with 1,000 bars. For pivoting, discrete attributes with a small number of distinct values is most appropriate. A bar chart with more than 10 bars becomes difficult to comprehend. Continuous columns or columns with unique values, such as keys, are not appropriate for analyses.

If you have a continuous column and you would like to use it in analyses as a pivoting attribute, you should discretize it. Discretizing means grouping or binning values to a few discrete groups. If you are using OLAP cubes, SSAS can help you. SSAS can discretize continuous attributes. However, automatic discretization is usually worse than discretization from a business perspective. Age and income are typical attributes that should be discretized from a business perspective. One year makes a big difference when you are 15 years old, and much less when you are 55 years old. When you discretize age, you should use narrower ranges for younger people and wider ranges for older people (these are used for Graph type pivoting).

A customer typically has an address, a phone number, and an email address. You do not analyze data on these columns. You do not need them for pivoting. However, you often need information such as the customer’s address on a report. If that data is not present in a DW, you will need to get it from an LOB database, probably with a distributed query. It is much simpler to store this data in your data warehouse. In addition, queries that use this data perform better, because the queries do not have to include data from LOB databases. Columns used in reports as labels only, not for pivoting, are called member properties.

In addition to the types of dimension columns already defined for identifying, naming,
pivoting, and labeling on a report, you can have columns for lineage information,

A dimension may contain the following types of columns:

■■ Keys Used to identify entities
■■ Name columns Used for human names of entities
■■ Attributes Used for pivoting in analyses
■■ Member properties Used for labels in a report
■■ Lineage columns Used for auditing, and never exposed to end users

Hierarchies

Figure 1-9 shows the DimCustomer dimension of the AdventureWorksDW2012 sample database.

dimcustomer

In the figure, the following columns are attributes (columns used for pivoting):
■■ BirthDate (after calculating age and discretizing the age)
■■ MaritalStatus
■■ Gender
■■ YearlyIncome (after discretizing)
■■ TotalChildren
■■ NumberChildrenAtHome
■■ EnglishEducation (other education columns are for translations)
■■ EnglishOccupation (other occupation columns are for translations)
■■ HouseOwnerFlag
■■ NumberCarsOwned
■■ CommuteDistance

All these attributes are unrelated. Pivoting on MaritalStatus, for example, is unrelated to
pivoting on YearlyIncome. None of these columns have any functional dependency between them, and there is no natural drill-down path through these attributes. Now look at the Dim- Date columns, as shown in Figure 1-10.

dimdate

Some attributes of the DimDate edimension include the following (not in the order shown in the figure):
■■ FullDateAlternateKey (denotes a date in date format)
■■ EnglishMonthName
■■ CalendarQuarter
■■ CalendarSemester
■■ CalendarYear

You will immediately notice that these attributes are connected. There is a functional dependency among them, so they break third normal form. They form a hierarchy. Hierarchies are particularly useful for pivoting and OLAP analyses—they provide a natural drill-down path. You perform divide-and-conquer analyses through hierarchies.

Hierarchies have levels. When drilling down, you move from a parent level to a child level. For example, a calendar drill-down path in the DimDate dimension goes through the following levels: CalendarYear ➝ CalendarSemester ➝ CalendarQuarter ➝ EnglishMonthName ➝ FullDateAlternateKey

At each level, you have members. For example, the members of the month level are, of course, January, February, March, April, May, June, July, August, September, October, November, and December. In DW and OLAP jargon, rows on the leaf level—the actual dimension rows—are called members. This is why dimension columns used in reports for labels are called member properties.

In a Snowflake schema, lookup tables show you levels of hierarchies. In a Star schema, you need to extract natural hierarchies from the names and content of columns. Nevertheless, because drilling down through natural hierarchies is so useful and welcomed by end users, you should use them as much as possible.

Slowly Changing Dimensions

There is one common problem with dimensions in a data warehouse: the data in the dimension changes over time. This is usually not a problem in an OLTP application; when a piece of data changes, you just update it. However, in a DW, you have to maintain history. The question that arises is how to maintain it. Do you want to update only the changed data, as in an OLTP application, and pretend that the value was always the last value, or do you want to maintain both the first and intermediate values? This problem is known in DW jargon as the Slowly Changing Dimension (SCD) problem.

The problem is best explained in an example. Table 1-1 shows original source OLTP data
for a customer.

OLTP1

The customer lives in Vienna, Austria, and is a professional. Now imagine that the customer moves to Ljubljana, Slovenia. In an OLTP database, you would just update the City column, resulting in the values shown in Table 1-2.

OLTP2

If you create a report, all the historical sales for this customer are now attributed to the
city of Ljubljana, and (on a higher level) to Slovenia. The fact that this customer contributed to sales in Vienna and in Austria in the past would have disappeared.

In a DW, you can have the same data as in an OLTP database. You could use the same key,
such as the business key, for your Customer dimension. You could update the City column when you get a change notification from the OLTP system, and thus overwrite the history.

 Type 1 SCD :  Type 1 means overwriting the history for an attribute and for all higher levels of hierarchies to which that attribute belongs.

The problem here would rise that you want to store the historical data . What if we want transactions of the same customer when he did in Vienna ? The solution to this problem would be to add a surrogate key (DW key ).

 Type 2 SCD : When you implement Type 2 SCD, for the sake of simpler querying, you typically also add a flag to denote which row is current for a dimension member. Alternatively, you could add two columns showing the interval of validity of a value.

SCD2

You could have a mixture of Type 1 and Type 2 changes in a single dimension. For example, in Table 1-3, you might want to maintain the history for the City column but overwrite the history for the Occupation column. That raises yet another issue. When you want to update the Occupation column, you may find that there are two (and maybe more) rows for the same customer. The question is, do you want to update the last row only, or all the rows? Table 1-4 shows a version that updates the last (current) row only, whereas Table 1-5 shows all of the rows being updated.

scd3

Fact Table Column Types

Fact tables are collections of measurements associated with a specific business process. You store measurements in columns. Logically, this type of column is called a measure. Measures are the essence of a fact table. They are usually numeric and can be aggregated. They store values that are of interest to the business, such as sales amount, order quantity, and discount amount.

All foreign keys together usually uniquely identify each row and can be used as a composite primary key.

For example, suppose you start building a sales fact table from an order details table in a source system, and then add foreign keys that pertain to the order as a whole from the Order Header table in the source system. Tables 1-7, 1-8, and 1-9 illustrate an example of such a design process.

Table 1-7 shows a simplified example of an Orders Header source table. The OrderId
column is the primary key for this table. The CustomerId column is a foreign key from the Customers table. The OrderDate column is not a foreign key in the source table; however, it becomes a foreign key in the DW fact table, for the relationship with the explicit date dimension. Note, however, that foreign keys in a fact table can—and usually are—replaced with DW surrogate keys of DW dimensions.

orderheader1

Table 1-8 shows the source Order Details table. The primary key of this table is a composite one and consists of the OrderId and LineItemId columns. In addition, the Source Order Details table has the ProductId foreign key column. The Quantity column is the measure .

orderdetail1

Table 1-9 shows the Sales Fact table created from the Orders Header and Order Details
source tables. The Order Details table was the primary source for this fact table. The OrderId, LineItemId, and Quantity columns are simply transferred from the source Order Details table.
The ProductId column from the source Order Details table is replaced with a surrogate DW ProductKey column. The CustomerId and OrderDate columns take the source Orders Header table; these columns pertain to orders, not order details. However, in the fact table, they are replaced with the surrogate DW keys CustomerKey and OrderDateKey.

salesfact

You do not need the OrderId and LineItemId columns in this sales fact table. For analyses, you could create a composite primary key from the CustomerKey, OrderDateKey, and Product- Key columns.

Additivity of Measures

Additivity of measures is not exactly a data warehouse design problem. However, you should consider which aggregate functions you will use in reports for which measures, and which aggregate functions you will use when aggregating over which dimension.
The simplest types of measures are those that can be aggregated with the SUM aggregate
function across all dimensions, such as amounts or quantities. For example, if sales for product A were $200.00 and sales for product B were $150.00, then the total of the sales was $350.00.
If yesterday’s sales were $100.00 and sales for the day before yesterday were $130.00, then the total sales amounted to $230.00. Measures that can be summarized across all dimensions are called additive measures.

Some measures are not additive over any dimension. Examples include prices and percentages, such as a discount percentage. Typically, you use the AVERAGE aggregate function for such measures, or you do not aggregate them at all. Such measures are called non-additive measures.

For some measures, you can use SUM aggregate functions over all dimensions but time.
Some examples include levels and balances. Such measures are called semi-additive measures.
For example, if customer A has $2,000.00 in a bank account, and customer B has $3,000.00, together they have $5,000.00. However, if customer A had $5,000.00 in an account yesterday but has only $2,000.00 today, then customer A obviously does not have $7,000.00 altogether. You should take care how you aggregate such measures in a report. For time measures, you can calculate average value or use the last value as the aggregate.

Quick Question : 

■■ You are designing an accounting system. Your measures are debit, credit, and
balance. What is the additivity of each measure?

Answer: Debit and credit are additive measure while balance is semi additive measure.

Summary : 

■■ Fact tables include measures, foreign keys, and possibly an additional primary key and lineage columns.
■■ Measures can be additive(fixed total sum of sales ) , non-additive(price changes or percentage changes ), or semi-additive(credit and debit in account) 32

.

ETL(Extract , Transform and Load) – Part 3 -Introduction to Data warehouse

Creating a Data Warehouse Database

A DW is a transformed  (LOB) data. You load data to your DW  on a schedule mostly in an overnight job. The DW data is not online, real-time data. You do not need to back up the transaction log for your data warehouse, as you would in an LOB database. Therefore, the recovery model for your data warehouse should be Simple.

■■ In the Full recovery model, all transactions are fully logged, with all associated data. You have to regularly back up the log. You can recover data to any arbitrary point in time. Point-in-time recovery is particularly useful when human errors occur.

■■ The Bulk Logged recovery model is an adjunct of the Full recovery model that permits
high-performance bulk copy operations. Bulk operations, such as index creation or
bulk loading of text or XML data, can be minimally logged. For such operations, SQL
Server can log only the Transact-SQL command, without all the associated data. You
still need to back up the transaction log regularly.

■■ In the Simple recovery model, SQL Server automatically reclaims log space for committed transactions. SQL Server keeps log space requirements small, essentially eliminating the need to manage the transaction log space.

In your data warehouse, large fact tables typically occupy most of the space. You can
optimize querying and managing large fact tables through partitioning. Table partitioning has management advantages and provides performance benefits. Queries often touch only subsets of partitions, and SQL Server can efficiently eliminate other partitions early in the query execution process.

A database can have multiple data files, grouped in multiple filegroups. There is no single best practice as to how many filegroups you should create for your data warehouse. However, for most DW scenarios, having one filegroup for each partition is the most appropriate. For the number of files in a filegroup, you should consider your disk storage. Generally, you should create one file per physical disk.

Loading data from source systems is often quite complex. To mitigate the complexity,
you can implement staging tables in your DW. You can even implement staging tables and other objects in a separate database. You use staging tables to temporarily store source data before cleansing it or merging it with data from other sources. In addition, staging tables also serve as an intermediate layer between DW and source tables. If something changes in the source—for example if a source database is upgraded—you have to change only the query that reads source data and loads it to staging tables. After that, your regular ETL process should work just as it did before the change in the source system. The part of a DW containing staging tables is called the data staging area (DSA).

Implementing Dimensions

Implementing a dimension involves creating a table that contains all the needed columns. In addition to business keys, you should add a surrogate key to all dimensions that need Type 2 Slowly Changing Dimension (SCD) management. You should also add a column that flags the current row or two date columns that mark the validity period of a row when you implement Type 2 SCD management for a dimension.

You can use simple sequential integers for surrogate keys. SQL Server can autonumber them for you. You can use the IDENTITY property to generate sequential numbers.

A sequence is a user-defined, table-independent (and therefore schema-bound) object.
SQL Server uses sequences to generate a sequence of numeric values according to your specification. You can generate sequences in ascending or descending order, using a defined interval.of possible values. You can even generate sequences that cycle (repeat).

As mentioned, sequences are independent objects, not associated with tables. You control the relationship between sequences and tables in your ETL application. With sequences, you can coordinate the key values across multiple tables.
You should use sequences instead of identity columns in the following scenarios:
■■ When you need to determine the next number before making an insert into a table.

■■ When you want to share a single series of numbers between multiple tables.

■■ When you need to restart the number series when a specified number is reached (that is, when you need to cycle the sequence).

Implementing Fact Tables

After you implement dimensions, you need to implement fact tables in your data warehouse. You should always implement fact tables after you implement your dimensions. A fact table is on the “many” side of a relationship with a dimension, so the parent side must exist if you want to create a foreign key constraint.

You should partition a large fact table for easier maintenance and better performance.

Columns in a fact table include foreign keys and measures. Dimensions in your database
define the foreign keys. All foreign keys together usually uniquely identify each row of a fact table.

In production, you can remove foreign key constraints to achieve better load performance.
If the foreign key constraints are present, SQL Server has to check them during the load. However, we recommend that you retain the foreign key constraints during the development and testing phases. It is easier to create database diagrams if you have foreign keys defined. In addition, during the tests, you will get errors if constraints are violated. Errors inform you that there is something wrong with your data; when a foreign key violation occurs, it’s most likely that the parent row from a dimension is missing for one or more rows in a fact table. These types of errors give you information about the quality of the data you are dealing with.
If you decide to remove foreign keys in production, you should create your ETL process
so that it’s resilient when foreign key errors occur. In your ETL process, you should add a row to a dimension when an unknown key appears in a fact table. A row in a dimension added during fact table load is called an inferred member. Except for the key values, all other column Key values for an inferred member row in a dimension are unknown at fact table load time, and you should set them to NULL. This means that dimension columns (except keys) should allow NULLs.

dw1

Columnstore Indexes and Batch Processing

A columnstore index is just another nonclustered index on a table. Query Optimizer considers using it during the query optimization phase just as it does any other index .

A columnstore index is often compressed even further than any data compression type
can compress the row storage—including page and Unicode compression. When a query
references a single column that is a part of a columnstore index, then SQL Server fetches only that column from disk; it doesn’t fetch entire rows as with row storage. This also reduces disk IO and memory cache consumption. Columnstore indexes use their own compression algorithm; you cannot use row or page compression on a columnstore index.

On the other hand, SQL Server has to return rows. Therefore, rows must be reconstructed when you execute a query. This row reconstruction takes some time and uses some CPU and memory resources. Very selective queries that touch only a few rows might not benefit from columnstore indexes. Columnstore indexes accelerate data warehouse queries but are not suitable for OLTP workloads.

The columnstore index is divided into units called segments. Segments are stored as large objects, and consist of multiple pages. A segment is the unit of transfer from disk to memory.
Each segment has metadata that stores the minimum and maximum value of each column for that segment. This enables early segment elimination in the storage engine. SQL Server loads only those segments requested by a query into memory.

SQL Server 2012 includes another important improvement for query processing. In batch mode processing, SQL Server processes data in batches rather than processing one row at a time. In SQL Server 2012, a batch represents roughly 1000 rows of data. Each column within a batch is stored as a vector in a separate memory area, meaning that batch mode processing is vector-based. Batch mode processing interrupts a processor with metadata only once per batch rather than once per row, as in row mode processing, which lowers the CPU burden substantially.

Loading and Auditing Loads

Loading large fact tables can be a problem. You have only a limited time window in which to do the load, so you need to optimize the load operation. In addition, you might be required to track the loads.

Using Partitions

Loading even very large fact tables is not a problem if you can perform incremental loads. However, this means that data in the source should never be updated or deleted; data should be inserted only. This is rarely the case with LOB applications. In addition, even if you have the possibility of performing an incremental load, you should have a parameterized ETL procedure in place so you can reload portions of data loaded already in earlier loads. There is always a possibility that something might go wrong in the source system, which means that you will have to reload historical data. This reloading will require you to delete part of the data from your data warehouse.

Deleting large portions of fact tables might consume too much time, unless you perform a minimally logged deletion. A minimally logged deletion operation can be done by using the TRUNCATE TABLE command; however, this command deletes all the data from a table—and deleting all the data is usually not acceptable. More commonly, you need to delete only portions of the data.

Inserting huge amounts of data could consume too much time as well. You can do a minimally logged insert, but as you already know, minimally logged inserts have some limitations. Among other limitations, a table must either be empty, have no indexes, or use a clustered index only on an ever-increasing (or ever-decreasing) key, so that all inserts occur on one end of the index. However, you would probably like to have some indexes on your fact table—at least a columnstore index. With a columnstore index, the situation is even worse—the table becomes read only.

You can resolve all of these problems by partitioning a table. You can even achieve better
query performance by using a partitioned table, because you can create partitions in different filegroups on different drives, thus parallelizing reads. You can also perform maintenance procedures on a subset of filegroups, and thus on a subset of partitions only. That way, you can also speed up regular maintenance tasks. Altogether, partitions have many benefits.

Although you can partition a table on any attribute, partitioning over dates is most common in data warehousing scenarios. You can use any time interval for a partition. Depending on your needs, the interval could be a day, a month, a year, or any other interval. You can have as many as 15,000 partitions per table in SQL Server 2012.

In addition to partitioning tables, you can also partition indexes. Partitioned table and
index concepts include the following:

■■ Partition function : This is an object that maps rows to partitions by using values
from specific columns. The columns used for the function are called partitioning columns. A partition function performs logical mapping.

■■ Partition scheme : A partition scheme maps partitions to filegroups. A partition
scheme performs physical mapping.

■■ Aligned index : This is an index built on the same partition scheme as its base table.
If all indexes are aligned with their base table, switching a partition is a metadata operation only, so it is very fast. Columnstore indexes have to be aligned with their base
tables. Nonaligned indexes are, of course, indexes that are partitioned differently than
their base tables.

■ Partition elimination : This is a Query Optimizer process in which SQL Server accesses only those partitions needed to satisfy query filters.

■■ Partition switching  : This is a process that switches a block of data from one table or
partition to another table or partition. You switch the data by using the ALTER TABLE T-SQL command. You can perform the following types of switches:
■ Reassign all data from a nonpartitioned table to an empty existing partition of a partitioned table.
■ Switch a partition of one partitioned table to a partition of another partitioned table.
■ Reassign all data from a partition of a partitioned table to an existing empty nonpartitioned table.

Any time you create a large partitioned table you should create two auxiliary nonindexed empty tables with the same structure, including constraints and data compression options.

For minimally logged deletions of large portions of data, you can switch a partition from
the fact table to the empty table version without the check constraint. Then you can truncate that table. The TRUNCATE TABLE statement is minimally logged. Your first auxiliary table is prepared to accept the next partition from your fact table for the next minimally logged deletion.

For minimally logged inserts, you can bulk insert new data to the second auxiliary table,
the one that has the check constraint. In this case, the INSERT operation can be minimally
logged because the table is empty. Then you create a columnstore index on this auxiliary
table, using the same structure as the columnstore index on your fact table. Now you can
switch data from this auxiliary table to a partition of your fact table. Finally, you drop the columnstore index on the auxiliary table, and change the check constraint to guarantee that all of the data for the next load can be switched to the next empty partition of your fact table. Your second auxiliary table is prepared for new bulk loads again.

ETL(Extract , Transform and Load) – Part 4 -Introduction to Data warehouse

Creating SSIS Packages

Data movement represents an important part of data management. Data is transported
from client applications to the data server to be stored, and transported back from
the database to the client to be managed and used. In data warehousing, data movement
represents a particularly important element, considering the typical requirements of a data warehouse: the need to import data from one or more operational data stores, the need to cleanse and consolidate the data, and the need to transform data, which allows it to be stored and maintained appropriately in the data warehouse .

Based on the level of complexity, data movement scenarios can be divided into two
groups:
■■ Simple data movements, where data is moved from the source to the destination “as-is” (unmodified)
■■ Complex data movements, where the data needs to be transformed before it can be
stored, and where additional programmatic logic is required to accommodate the
merging of the new and/or modified data, arriving from the source, with existing data, already present at the destination .

What constitutes a complex data movement? Three distinct elements can be observed in
any complex data movement process:

1. The data is extracted from the source (retrieved from the operational data store).

2. The data is transformed (cleansed, converted, reorganized, and restructured) to comply with the destination data model.

3. The data is loaded into the destination data store (such as a data warehouse).

Planning a Simple Data Movement

To determine whether the and Export Wizard is the right tool for a particular data movement, ask yourself a few simple questions:
■■ Will the data need to be transformed before it can be stored at the destination?
If no transformations are required, then the Import and Export Wizard might be the
right tool for the job.

■■ Is it necessary to merge source data with existing data at the destination?
If no data exists at the destination (for example, because the destination itself does not
yet exist), then using the Import and Export Wizard should be the right choice.

Quick Questions :

1. What is the SQL Server Import and Export Wizard?

Answer :

The Import and Export Wizard is a utility that provides a simplified interface
for developing data movement operations where data is extracted from a
source and loaded into a destination, without the need for any transformations.

2. What is the principal difference between simple and complex data movements?

Answer :

In simple data movements, data is copied from one data store into another one
unmodified, whereas in complex data movements, data is modified (transformed)
before being loaded into the destination data store

Learning SSIS : Introducing Control Flow, Data Flow, and Connection Managers

■■ Connection managers Provide connections to data stores, either as data sources or
data destinations. Because the same data store can play the role of the data source as
well as the data destination, connection managers allow the connection to be defined
once and used many times in the same package (or project).

■■ Control flow Defines both the order of operations and the conditions under which
they will be executed. A package can consist of one or more operations, represented
by control flow tasks. Execution order is defined by how individual tasks are connected
to one another. Tasks that do not follow any preceding task as well as tasks that follow
the same preceding task are executed in parallel.

■■ Data flow Encapsulates the data movement components—the ETL:

■ One or more source components, designating the data stores from which the data will be extracted.
■ One or more destination components, designating the data stores into which the
data will be loaded.
■ One or more (optional) transformation components, designating the transformations
through which the data will be passed.

Quick Questions : 

1. What is a control flow?

Answer : 

In SSIS packages, the control flow defines the tasks used in performing data
management operations; it determines the order in which these tasks are executed
and the conditions of their execution.
2. What is a data flow?

Answer : 

In SSIS packages, the data flow is a special control flow task used specifically in
data movement operations and data transformations.

Implementing Control Flow :

SSIS supports a variety of data stores (such as files, [relational] database management systems, SQL Server Analysis Services databases, web servers, FTP servers, mail servers, web services.

ADO connection manager : The ADO connection manager enables connections to ActiveX Data Objects (ADO) and is provided mainly for backward compatibility .

ADO.NET connection manager : The ADO.NET connection manager enables connections to data stores using a Microsoft .NET provider. It is compatible with SQL Server.

Analysis Services connection manager: The Analysis Services connection manager provides access to SSAS databases.

Excel connection manager : As the name suggests, the Excel connection manager provides access to data in Microsoft Excel workbooks.

Flat File connection manager : These connection managers provide access to flat files—delimited or fixed width text files (such as comma-separated values files)

Connection Manager Scope :

SQL Server Database Tools (SSDT) support two connection manager definition techniques,
providing two levels of availability:
■■ Package-scoped connection managers are only available in the context of the SSIS
package in which they were created and cannot be reused by other SSIS packages in
the same SSIS project

■■ Project-scoped connection managers are available to all packages of the project in
which they were created .

Planning a Complex Data Movement :

Typically, the transformation could be any or all of the following:
■■ Data cleansing :  Unwanted or invalid pieces of data are discarded or replaced with
valid ones. Many diverse operations fit this description—anything from basic cleanup
(such as string trimming or replacing decimal commas with decimal points) to quite
elaborate parsing (such as extracting meaningful pieces of data by using Regular
Expressions).

■■ Data normalization : In this chapter, we would like to avoid what could grow into a
lengthy debate about what exactly constitutes a scalar value, so the simplest definition
of normalization would be the conversion of complex data types into primitive
data types (for example, extracting individual atomic values from an XML document or
atomic items from a delimited string).

■■ Data type conversion : The source might use a different type system than the
destination. Data type conversion provides type-level translation of individual values
from the source data type to the destination data type (for example, translating a .NET
Byte[] array into a SQL Server VARBINARY(MAX) value).

■■ Data translation :  The source might use different domains than the destination.
Translation provides a domain-level replacement of individual values of the source domain with an equivalent value from the destination domain (for example, the character “F” designating a person’s gender at the source is replaced with the string “female” representing the same at the destination).

■■ Data validation : This is the verification and/or application of business rules against
individual values (for example, “a person cannot weigh more than a ton”), tuples (for
example, “exactly two different persons constitute a married couple”), and/or sets (for
example, “exactly one person can be President of the United States at any given time”).’
■■ Data calculation and data aggregation : In data warehousing, specifically, a common requirement is to not only load individual values representing different facts or
measures, but also to load values that have been calculated (or pre-aggregated) from
the original values (for example, “net price” and “tax” exist at the source, but “price
including tax” is expected at the destination).
■■ Data pivoting and data unpivoting:  Source data might need to be restructured or
reorganized in order to comply with the destination data model (for example, data in
the entry-attribute-value (EAV) might need to be restructured into columns or vice versa).

Tasks

Data Preparation Tasks : These tasks, shown in Table 4-3, are used to prepare data sources for further processing; the preparation can be as simple as copying the source to the server, or as complex as profiling the data, determining its informational value, or even discovering what it actually is.

File System task:  This task provides operations on file system objects (files and folders), such as copying, moving, renaming, deleting objects, creating folders, and setting object attributes.

FTP task : This task provides operations on file system objects on a remote file store via the File Transfer Protocol (FTP), such as receiving, sending, and deleting files, as well
as creating and removing directories.

Web Service task :  This task provides access to web services; it invokes web service methods, receives the results, and stores them in an SSIS variable or writes them to a file
connection.

This task provides :  XML manipulation against XML files and XML data, such as
validation (against a Document Type Definition or an XML Schema), transformations
(using XSLT), and data retrieval (using XPath expressions). It also supports
more advanced methods, such as merging two XML documents .

Data Profiling task : This task can be used in determining data quality and in data cleansing. It can be useful in the discovery of properties of an unfamiliar data set.

Workflow Tasks :

These tasks, shown in Table , facilitate workflow, which is the structure of the process in
terms of its relationships with the environment and related processes; these tasks automate the interaction between individual SSIS processes and/or the interaction between SSIS processes and external processes (processes that exist outside SQL Server).

Execute Package task : This task executes other SSIS packages, thus allowing the distribution of programmatic logic across multiple SSIS packages, which in turn increases the reusability of individual SSIS packages and enables a more efficient division of labor within the SSIS development team.

Execute Process task :  This task executes external processes (that is, processes external to SQL Server). The Execute Process task can be used to start any kind of Windows application.

Message Queue task : This task is used to send and receive messages to and from Microsoft Message Queuing (MSMQ) queues on the local server.

Send Mail task  :  The task allows the sending of email messages from SSIS packages by using the Simple Mail Transfer Protocol (SMTP).

Expression task :  This task is used in the workflow to process variables and/or parameters and to assign the results to other variables used by the SSIS process .

CDC Control task :  This task controls the life cycle of SSIS packages that rely on the SQL Server 2012 Change Data Capture (CDC) functionality. It provides CDC information
from the data source to be used in CDC-dependent data flows.

Data Movement Tasks

Bulk Insert task : This task allows the loading of data from formatted text files into a SQL Server database table (or view); the data is loaded unmodified (because transformations are not supported), which means that the loading process is fast and efficient .

Execute SQL task : This task executes SQL statements or stored procedures against a supported data store. The task supports the following data providers: EXCEL, OLE DB, ODBC, ADO, ADO.NET, and SQLMOBILE, so keep this in mind when planning connection managers.

Data flow task : This task is essential to data movements, especially complex data movements, because it provides all the elements of ETL (extract-transform-load); the architecture of the data flow task allows all of the transformations to be performed in flight and in memory, without the need for temporary storage.

SQL Server Administration Tasks

Transfer Database task : Use this task to copy or move a database from one SQL Server instance to another or create a copy of it on the same server. It supports two modes of operation:
■■ In online mode, the database is transferred by using SQL Server Management Objects (SMO), allowing it to remain online for the duration of the transfer.
■■ In offline mode, the database is detached from the source instance, copied to the destination file store, and attached at the destination instance, which takes less time compared to the online mode, but for the entire duration the database is inaccessible.

Transfer Error Messages task :  Use this task to transfer user-defined error messages from one SQL Server instance to another; you can transfer all user-defined messages or specify individual ones.

Transfer Jobs task :  Use this task to transfer SQL Server Agent Jobs from one SQL Server instance to another; you can transfer all jobs or specify individual ones.

Transfer Logins task :  Use this task to transfer SQL Server logins from one SQL Server instance to another; you can transfer all logins, logins mapped to users of one or more
specified databases, or individual users.

SQL Server Maintenance Tasks

Back Up Database task :  Use this task in your maintenance plan to automate full, differential, or transaction log backups of one or more system and/or user databases.
Filegroup and file level backups are also supported.

Check Database Integrity task :  Use this task in your maintenance plan to automate data and index page integrity checks in one or more system and/or user databases.

Execute SQL Server Agent Job task:  Use this task in your maintenance plan to automate the invocation of SQL Server Agent Jobs to be executed as part of the maintenance plan. Execute T-SQL Statement task Use this task in your maintenance plan to execute Transact-SQL scripts as part of the maintenance plan.

You should not confuse the very basic Execute T-SQL Statement Task with the more advanced Execute SQL Task described earlier in this lesson. The Execute T-SQL Statement Task only provides a very basic interface, which will allow you to select the connection manager and specify the statement to execute; parameters, for instance, are not supported in this task.

History Cleanup task : Use this task in your maintenance plan to automate the purging of historical data about backups and restore operations, as well as SQL Server Agent and maintenance plan operations on your SQL Server instance.

Maintenance Cleanup task : Use this task in your maintenance plan to automate the removal of files left over by maintenance plan executions; you can configure the task to remove old backup files or maintenance plan text reports.

Notify Operator task : Use this task in your maintenance plan to send email messages to SQL Server Agent operators.

Rebuild Index task :  Use this task in your maintenance plan to automate index rebuilds for one or more databases and one or more objects (tables or indexed views).
Reorganize Index task : Use this task in your maintenance plan to automate index reorganizations for one or more databases and one or more objects (tables or indexed views).
Shrink Database task :  Use this task in your maintenance plan to automate database shrink operations.
Update Statistics task : Use this task in your maintenance plan to automate updates of statistics for one or more databases and one or more objects (tables or indexed views).

The Script Task

This special task exposes the SSIS programming model via its .NET Framework implementation to provide extensibility to SSIS solutions. The Script task allows you to integrate custom data management operations with SSIS packages. Customizations can be provided by using any of the programming languages supported by the Microsoft Visual Studio Tools for Applications (VSTA) environment (such as Microsoft Visual C# 2010 or Microsoft Visual Basic 2010).
Typically, the Script task would be used to provide functionality that is not provided by
any of the standard built-in tasks, to integrate external solutions with the SSIS solution, or to provide access to external solutions and services through their application programming interfaces (APIs).

Containers

When real-world concepts are implemented in SSIS, the resulting operations can be composed of one or more tasks. To allow tasks that logically form a single unit to also behave as a single unit SSIS introduces containers. Containers provide structure (for example, tasks that represent the same logical unit can be grouped in a single container, both for improved readability as well as manageability), encapsulation (for example, tasks enclosed in a loop container will be executed repeatedly as a single unit), and scope (for example, container-scoped resources can be accessed by the tasks placed in the same container, but not by tasks placed outside).

For Loop container:  This container executes the encapsulated tasks repeatedly, based on an expression—the looping continues while the result of the expression is true; it is based on the same concept as the For loop in most programming
languages.
Foreach Loop container : This container executes the encapsulated tasks repeatedly, per each item of the selected enumerator; it is based on the same iterative concept as the
For-Each loop in most contemporary programming languages.It is suited for executing a set of operations repeatedly based on an enumerable collection of items (such as files in a folder, a set of rows in a table, or an
array of items).

Sequence container ; This container has no programmatic logic other than providing structure to encapsulate tasks that form a logical unit, to provide a scope for SSIS
variables to be accessible exclusively to a specific set of tasks or to provide
a transaction scope to a set of tasks.

There are three precedence constraint types, all of them equivalent in defining sequences but different in defining the conditions of execution:
■■ A success constraint allows the following operation to begin executing when the preceding operation has completed successfully (without errors).
■■ A failure constraint allows the following operation to begin executing only if the preceding operation has completed unsuccessfully (with errors).
■■ A completion constraint allows the following operation to begin executing when the
preceding operation has completed, regardless of whether the execution was successful
or not.