Author: ahmed7

Normalization -Part 1

First Normal Form (1NF):

The values should be atomic . There should be no repeating data in a single column. And a Primary Key should be assigned to the table.

For example : Taking example from Ramaz Elmasri book , consider a company with multiple department locations in USA.

1

The DNumber is the Primary Key . We cannot have multiple values as they are in DLocations column  {Bellaire, Sugarland ,Houston}. They should be atomic. To normalize it to 1NF , we can easily apply the following technique:

  1. As we see, DNumber and DLocations uniquely identify each row , so we can make {DNumber and DLocations} as the combined Primary Key.

2

Second Normal Form (2NF):

For the table to be in 2NF , it should be in 1NF and secondly it should not have any partial dependency (means every non key column should not be dependent partially on the Primary Key) . We can say that every non key column should be fully dependent on the Primary Key.

Consider the following table EMP_PROJ:

3

To achieve it into 1NF , make SSN and PNumber the Primary Key .

But what about the functional dependencies ? Remember we told that there should be no partial dependency on part of the Primary Key . Here EName is dependent on SSN !

To further elaborate , consider the following table :

5

Consider the SSN and PNumber as the Primary Key . The functional dependency FD1 , Hours is dependent fully on SSN and PNumber , which is fine according to the 2NF rule as it is fully dependent on PK. But look at FD2 and FD3 , EName is dependent only on SSN which violates our 2NF rule of being fully dependent on whole Key . Also FD3 , PName and PLocation is also dependent on PNumber which also violates the 2NF rule .

To achieve 2NF , we break the tables as the following :

6

The first table above : EP1 consists of the FD1 column Hours which was fully dependent on whole PK {SSN,PNumber}.

EP2 table we made for FD2 , Ename which was dependent on SSN .

EP3 table we made for FD3 , PName and PLocation which was dependent on PNumber .

So in this way we can achieve the 2NF. The EMP_Proj table is now in 2NF .

 

Third Normal Form (3NF):

For the table to be in 3NF ,it should be in 2NF and there should be no transitive dependency . Which means that any non key column should not be transitively dependent on Primary key.

Transitive dependency : X->Y so Y->Z.

Consider the following table:

7

To explain , see the DNumber column which dependent on SSN. So consider , SSN as X , DNumber as Y  and Dname as Z .  Or we can say the DName and DMGRSSN is dependent on DNumber which is a non Key column .

To achieve , 3NF , we can do the following :

8

We separated the DName and DMGRSSN in a separate table whose PK is DNumber !

 

Thanks

 

What is SQL

SQL stands for : Structured Query Language.

Was designed to manage and maintain data in Relational Database Management Systems(RDBMS) which revolves around relational model and predicate logics.

Has three types of statements :

DDL (Data Definition Language)

Statements include :  Create, Alter, Drop

DML (Data Manipulation Language)

Statements include :  Select , Insert, Update, Truncate, Delete

DCL (Data Control Language)     

Statements include :  Grant, Revoke

 

What is a Database, Schema and Object

Consider a Database as a main box which contains inside it different boxes(schemas) and those boxes(schemas) contain different objects like (tables,views, stored procedures and functions etc)

  • When It is said that Objects are contained in the Schema ,so the best advantage of the Schema level is Security . You can assign the different users of a database , a certain amount of privileges/rights ;
  • For example:

– Allowing User A : To use only select statements .

– Allowing User B : To see Encrypted Stored Procedures allowing all other privileges.

– Allowing User C : Not giving him rights to delete, update or execute any stored procedure.

The Below picture illustrates the design of Database ,schema and object . A schema can contain multiple objects in it .

objct2

How SQL Processes Select Statements

In this blog , we shall teach how a SQL Processes the select statement .  Processing SQL Statement means here how SQL  sees , perceives and processes and gives the final result of SQL select Statement .

For Example :

=================================================

SELECT studentid, Firstname, Lastname,count(*) as Total

FROM dbo.Student

WHERE Studentid = 10

GROUP BY studentid, Firstname, Lastname

HAVING COUNT(*) > 1

ORDER BY studentid

 

===============================================

Now note the order of how the Select statements are actually processed IN SQL :

1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY

Explanation:

1. Get the rows from the table
2. Chooses only those rows where the studentID is 10
3. Groups the records by studentid, Firstname, Lastname
4. It groups studentid, Firstname, Lastname Having more than one record
5. Select returns for the grouped rows
6. Orders sorts the rows in the output by studentid

 

The ‘CASE’ Expression

The case statements are similar to the switch cases like in C/C#. The CASE executes the statement which is logically correct and matches the condition.

Take the following example , that the you want to select a particular teachername from the list through teacherId column, from the following table StudentDetail :

Select studentid, studentname, teacherid, teachername  from   StudentDetail 

Using the case expression :

SELECT studentid, studentname, teacherid, teachername
CASE teacherid
WHEN 1 THEN ‘Teacher1’
WHEN 2 THEN ‘Teacher2’
WHEN 3 THEN ‘Teacher3’
WHEN 4 THEN ‘Teacher4’
WHEN 5 THEN ‘Teacher5’
WHEN 6 THEN ‘Teacher6’
WHEN 7 THEN ‘Teacher7’
WHEN 8 THEN ‘Teacher8’
ELSE ‘Unknown’
END
FROM StudentDetail order by studentid

Table Expression

Table query is a named query which is like a relational table which can be used like other tables .

The Microsoft SQL has four types of table expressions and we will go one by one :

1 – Derived tables

2 – CTE (common table expressions)

3 – Views

4 – Inline table valued Function (TVF)

1 – Derived Tables :

Derived tables are the queries which we write in the FROM clause of the outer query and there existence depends on the outer query . Once the outer query is finished , the derived table is finished.

AS we are using the TSQL2012 database , We can execute the

select * from (select custid,companyname from Sales.Customers) as customer

select * from (select custid,companyname from Sales.Customers where city like ‘%Berlin%’ ) as customer
select * from (select orderyear,count(distinct custid) as numcusts
from (select YEAR(orderdate) as orderyear, custid from Sales.Orders) as outer1
group by orderyear) as outer2
where outer2.numcusts > 70

Note : A query must have to meet the requirements of the table expression.

1. The rows returned might not have guaranteed order as the table expression is supposed to represent a relational table .

2 . All columns names in the table expression must be unique for instance : If we join two tables in the table expression and both the tables have columns with same name then we can have a problem. The easiest way is to give column aliases to these columns which have same name.

3. Assign column aliases: The best benefit of table expression is that you can refer the column aliases used in the select of inner query in the outer query. From this we can undertand that the column aliases used in the select clause cannot be used in the where or group by clause because select clause is logically processed after the where and group by clause in SQL .

For example : The following statement will give an error :

select YEAR(orderdate) as orderyear, custid from Sales.Orders) as outer1
group by orderyear)

Msg 207, Level 16, State 1, Line 5
Invalid column name ‘orderyear’.

We can solve this problem by using the YEAR(orderdate) in group by or where clause. But there can be possibility that there can be many expressions or inner queries. So to maintain code readability , we can just give the name to the query for the ease and use that name wherever needed.

SELECT orderyear, COUNT(DISTINCT custid) AS numcusts
FROM (SELECT YEAR(orderdate) AS orderyear, custid
FROM Sales.Orders) AS D
GROUP BY orderyear;

Here we defined a derived table with name D on select year(orderdate) as orderyear , custid  which returns order year and customer ID of all rows . The outer query can refer to orderyear alias of inner query , in the Group By clause and in its select clause .

2 – Common Table Expressions (CTE):

CTE are another type of table expressions . They are defined by WITH statements, have a inner query defining CTE and the outer query which uses against it .

For instance :

WITH USAcustomers as

( select custid, companyname  FROM sales.customers where country = ‘USA’

)

select * from USAcustomers ;

 

We can also use column aliases in the outer query .

WITH C AS
(
select year(orderdate) as orderyear, custid 
from sales.orders
)

select orderyear , count(distinct custid) as numcasts
from C 
Group by orderyear;

Use the semicolon to end the CTE at the end of outer query.

Multiple CTEs

CTE has several advantages over derived tables. If you need to refer to one CTE from another we don’t end up nesting like we do in derived tables. Instead , when we define multiple CTE we define them by commas under the same WITH statement.

with c1 as 
(
select year(orderdate) as orderyear, custid
from sales.orders
),
c2 as 
(
select orderyear, count(distinct custid) as numcasts
from c1 group by orderyear
)
select orderyear, numcasts from c2 where numcasts > 70

This approach improves the readability and maintainability of the code compared to the nested derived table does.

 

Recursive CTEs:        

Recursive CTE has recursion capablities. Recursive CTE is defined by two queries. One is called Anchor member and one is called recursive member .

WITH <CTE_Name>[(<target_column_list>)]
AS
(
<anchor_member>
UNION ALL
<recursive_member>
)
<outer_query_against_CTE>;

The anchor member returns a valid result table and is invoked only once .

The recursive member is a query which has reference to the CTE name.

The reference to the CTE name is the result set which is returned by the previous execution. When the first time the recursive member is invoked , the previous result set is whatever the anchor member returned.

In the next invocations , the result set referenced to the CTE name has the result of the previous invocations of the recursive members. The recursive member is invoked repeatedly until it returns an empty set .

Both the queries must be having the same columns and their types.

The reference to the CTE name in the outer query is the unified result sets of the invocation of the anchor member and all invocations of the recursive member .

with EmployeeCTE as 
( select empid, mgrid, firstname, lastname
from hr.employees
where empid = 2 

union all

select c.empid,c.mgrid,c.firstname,c.lastname
from EmployeeCTE p 
inner join HR.Employees as c 
on c.mgrid = p.empid
)
select empid, mgrid,firstname, lastname
from EmployeeCTE

The anchor member queries the HR.Employees table and simply returns the row for employee 2.
SELECT empid, mgrid, firstname, lastname
FROM HR.Employees
WHERE empid = 2

The recursive member joins the CTE—representing the previous result set—with the Employees table to return the direct subordinates of the employees returned in the previous result set.

SELECT C.empid, C.mgrid, C.firstname, C.lastname
FROM EmployeeCTE AS P
JOIN HR.Employees AS C
ON C.mgrid = P.empid

In other words, the recursive member is invoked repeatedly, and in each invocation it returns the next level of subordinates. The first time the recursive member is invoked, it returns the direct subordinates of employee 2—employees 3 and 5. The second time the recursive member is invoked, it returns the direct subordinates of employees 3 and 5—employees 4, 6, 7, 8, and 9. The third time the recursive member is invoked, there are no more subordinates; the recursive member returns an empty set, and therefore recursion stops.

MERGE Statement

We can modify data using (INSERT, Update and delete ) using merge statement based on conditional logic. The format is little different as follows :

We use the target table name in MERGE clause .

And source table in USING clause .

We define merge condition by specifying a predicate in ON clause like as we do in join .

The merge condition defines which of the rows in source table matches with the rows in target table.

In WHEN MATCHED THEN clause we define the action to be taken when rows are matched.

And In WHEN NOT MATCHED THEN we define the action when the rows are not matched.

 

For example :

MERGE INTO dbo.targettable AS TGT
USING dbo.sourcetable AS SRC
ON TGT.custid = SRC.custid
WHEN MATCHED THEN
UPDATE SET
TGT.personname = SRC.personname,
TGT.phone = SRC.phone,
TGT.address = SRC.address

WHEN NOT MATCHED THEN
INSERT (personid, personname , phone, address)
VALUES (SRC.personid, SRC.personname , SRC.phone, SRC.address);

 

The MERGE statement defines an UPDATE action when a match is found, setting the target personname , phone, and address values to those of the matched rows from the source table.

The MERGE statement defines an INSERT action when a match is not found, it inserts the rows from the source to the target table

 

Interview Questions

Q 1 : What are types of system databases:

The following are the types of databases:

Master database holds instance-wide metadata information, server configuration, information about all databases in the instance, and initialization information. the system views contain information about system, hardware, indexes, columns , memory etc.

tempdb The tempdb database is where SQL Server stores temporary data such as work tables, sort space, row versioning information, and so on. SQL Server allows you to create temporary tables for your own use, and the physical location of those temporary tables is tempdb .

Resource The Resource database is a hidden, read-only database that holds the definitions of all system objects. When you query system objects in a database, they appear to reside in the sys schema of the local database, but in actuality their definitions reside in the Resource database. contains information that was
previously in the master database and was split out from the master database to make service pack upgrades easier to install.

Model The model database is used as a template for new databases. Every new database that you create is initially created as a copy of model. So if you want certain objects (such as data types) to appear in all new databases that you create, or certain database properties to be configured in a certain way in all new databases, you need to create those objects and configure those properties in the model database. Note that changes you apply to the model database will not affect existing databases . The contains information that was previously in the master database and was split out from the master database to make service pack upgrades easier to install.

msdb The msdb database is where a service called SQL Server Agent stores its data. SQL Server Agent is in charge of automation, which includes entities such as jobs, schedules, and alerts. The SQL Server Agent is also the service in charge of replication. The msdb database also holds information related to other SQL Server features such as Database Mail, Service Broker, backups, and more.

Q 2 : What is a Filegroup ?

The database is made up of data files and transaction log files . The data files hold object data, and the log files hold information that SQL Server needs to maintain transactions. Data files are organized in logical groups called filegroups.

A filegroup is the target for creating an object, such as a table or an index. The object data will be spread across the files that belong to the target filegroup. Filegroups are your way of controlling the physical locations of your objects.

A database must have at least one filegroup called PRIMARY, and can optionally have other user filegroups as well. The PRIMARY filegroup contains the primary data file (which has an .mdf extension) for the database, and the database’s system catalog. You can optionally add secondary data files (which have an .ndf extension) to PRIMARY. User filegroups contain only secondary data files. You can decide which filegroup is marked as the default filegroup.

This technique can support two main strategies:

Using multiple filegroups can increase performance by separating heavily used tables or indexes onto different disk subsystems.

■ Using multiple filegroups can organize the backup and recovery plan by containing static data in one filegroup and more active data in another filegroup. We can allocate or spread filegroups nd1,nd2…ndn to multiple drives to increase performance.

Q 3 : What is a Primary key?

A primary key constraint enforces uniqueness of rows and also disallows NULL marks in the constraint attributes.

Q 4 : What is a Unique Constraints  ?

A unique constraint enforces the uniqueness of rows, allowing you to implement the concept of alternate keys from the relational model in your database. Unlike with primary keys, you can define multiple unique constraints within the same table. Also, a unique constraint is not restricted to columns defined as NOT NULL.

Q 5  : What is a Foreign Key Constraints  ?

A foreign key enforces referential integrity. This constraint is defined on one or more attributes in what’s called the referencing table and points to candidate key (primary key or unique constraint) attributes in what’s called the referenced table . Note that NULL marks are allowed in the foreign key columns (mgrid in the last example) even if
there are no NULL marks in the referenced candidate key columns.

Q 6 : What is a Check Constraints ? 

A check constraint allows you to define a predicate that a row must meet to be entered into the table or to be modified. For example, the following check constraint ensures that the salary column in the Employees table will support only positive values.
ALTER TABLE dbo.Employees
ADD CONSTRAINT CHK_Employees_salary
CHECK(salary > 0.00);

Q 7 : What is the difference between where and having clause  ?

The WHERE clause is evaluated before rows are grouped, and therefore is evaluated
per row. The HAVING clause is evaluated after rows are grouped, and therefore
is evaluated per group.

Q 8 : Why it is not allowed to refer to a column alias defined by the SELECT
clause in the WHERE clause?

Because the WHERE clause is logically evaluated first in query processing than the SELECT clause.

Q 9 : What is the  performance wise benefits of the WHERE clause ? 

It reduces network traffic and if properly used with indexes can reduce the full table scanning.

Q 10 : What is the difference between self-contained and correlated subqueries?

Self-contained sub queries are independent of the outer query, whereas correlated
sub queries have a reference to an element from the table in the outer query.

Q 11 : What is the difference between the APPLY and JOIN operators?

With a JOIN operator, both inputs represent static relations. With APPLY, the
left side is a static relation, but the right side can be a table expression with
correlations to elements from the left table.

Q 12 : What are two requirements for the queries involved in a set operator ?

The number of columns in the two queries needs to be the same  and the corresponding
columns need to have compatible types.

Q 13 : What makes a query a grouped query?

When you use an aggregate function, a GROUP BY clause, or both.

Q 14 : What are the clauses that you can use to define multiple grouping sets in the
same query?

GROUPING SETS, CUBE, and ROLLUP.

Q 15  : What is the difference between PIVOT and UNPIVOT?

PIVOT rotates data from a state of rows to a state of columns; UNPIVOT rotates
the data from columns to rows.

Q 16 : Can you store indexes from the same full-text catalog to different filegroups?

Yes. A full-text catalog is a virtual object only; full-text indexes are physical objects.
You can store each full-text index from the same catalog to a different file group.

Q 17 : How do you search for synonyms of a word with the CONTAINS predicate?

You have to use the CONTAINS(FTcolumn, ‘FORMSOF(THESAURUS, SearchWord1)’) syntax.

Q 18 : Can a table or column name contain spaces, apostrophes, and other nonstandard characters?

Yes

Q 19 : What types of table compression are available?

Page and row level compression.

Q 20 : How SQL Server enforce uniqueness in both primary key and unique constraints?

SQL Server uses unique indexes to enforce uniqueness for both primary key
and unique constraints.

Q 21 : What type of data does an inline function return?

Inline functions return tables, and accordingly, are often referred to as inline
table-valued functions.

Q 22 : What is difference between view and an inline function ? 

An inline table-valued function can be said as a parameterized view—that is, a
view that takes parameters.

Q 23 : What is the difference between SELECT INTO and INSERT SELECT?

SELECT INTO creates the target table and inserts into it the result of the query.
INSERT SELECT inserts the result of the query into an already existing table.

Q 24: Can we update rows in more than one table in one UPDATE statement?

No, we can use columns from multiple tables as the source, but update only
one table at a time.

Q 25 : How many columns with an IDENTITY property are supported in one table? And How do you obtain a new value from a sequence?

One.

We use NEXT VALUE FOR function for it.

Q 26 : What is the purpose of the ON clause in the MERGE statement?

The ON clause determines whether a source row is matched by a target row,
and whether a target row is matched by a source row. Based on the result of
the predicate, the MERGE statement knows which WHEN clause to activate and
as a result, which action to take against the target.

Q 27 : What are the possible actions in the WHEN MATCHED clause?

UPDATE and DELETE.

Q 28 : How many WHEN MATCHED clauses can a single MERGE statement have?

Two—one with an UPDATE action and one with a DELETE action.

Q 29: Why is it important for SQL Server to maintain the ACID quality of
transactions?

To ensure that the integrity of database data will not be compromised.

Q 30 : How does SQL Server implement transaction durability?

By first writing all changes to the database transaction log before making changes permanently to the database data on disk.

Q 31 : How many ROLLBACKs must be executed in a nested transaction to roll it back?

Only one ROLLBACK. A ROLLBACK always rolls back the entire transaction, no
matter how many levels the transaction has.

Q 32 : How many COMMITs must be executed in a nested transaction to ensure that
the entire transaction is committed?

One COMMIT for each level of the nested transaction. Only the last COMMIT
actually commits the entire transaction.

Q 33 : Can readers block readers?

No  because shared locks are compatible with other shared locks.

Q 34: Can readers block writers?

Yes, even if only momentarily, because any exclusive lock request has to wait
until the shared lock is released.

Q 35 : If two transactions never block each other, can a deadlock between them
result?

No. In order to deadlock, each transaction must already have locked a resource the other transaction wants, resulting in mutual blocking.

Q 36 : Can a SELECT statement be involved in a deadlock?

Yes. If the SELECT statement locks some resource that keeps a second transaction
from finishing, and the SELECT cannot finish because it is blocked by the
same transaction, the deadlock results.

Q 37 : If your session is in the READ COMMITTED isolation level, is it possible for one of your queries to read uncommitted data?

Yes, if the query uses the WITH (NOLOCK) or WITH (READUNCOMMITTED)
table hint where WITH (NOLOCK) ignoring the locks . The session value for the isolation level does not change, just the characteristics for reading that table.

Q 38 : Is there a way to prevent readers from blocking writers and still ensure that
readers only see committed data?

Yes, that is the purpose of the READ COMMITTED SNAPSHOT option within the
READ COMMITTED isolation level. Readers see earlier versions of data changes
for current transactions, not the currently uncommitted data.

Q 39 : What is the result of the parsing phase of query execution?

The result of this phase, if the query passed the syntax check, is a tree of logical
operators known as a parse tree.

Q 40 : How we  measure the amount of disk I/O a query is performing?

We use the SET STATISTICS IO command.

Q 41 : Which DMO gives you detailed text of queries executed?

You can retrieve the text of batches and queries executed from the

sys.dm_exec_sql_text DMO.

Q 42 :What are the two types of parameters for a T-SQL stored procedure? 

A T-SQL stored procedure can have only an input and  a output parameter.

Q 43 : Can a stored procedure span multiple batches of T-SQL code? 

No, a stored procedure can only contain one batch of T-SQL code.

Q 44 : What are the two types of DML triggers that can be created?

You can create AFTER and INSTEAD OF DML-type triggers.

Q 45 : If an AFTER trigger discovers an error, how does it prevent the DML command from completing?

An AFTER trigger issues a THROW or RAISERROR command to cause the transaction
of the DML command to roll back.

Q 46 : What are the two types of table-valued UDFS? And What type of UDF returns only a single value?

You can create inline or multistatement table-valued UDFs. And  scalar UDF returns only a single value.

Q 47 : What kind of clustering key would you select for an OLTP environment?

For an OLTP environment, a short, unique, and sequential clustering key might be
the best choice.

Q 48 : Which clauses of a query should you consider supporting with an index?

The list of the clauses you should consider supporting with an index includes, but
is not limited to, the WHERE, JOIN, GROUP BY, and ORDER BY clauses.

Q 49 : How would you quickly update statistics for the whole database after an upgrade?

We should use the sys.sp_updatestats system procedure .

Q 50  : What are the commands that are required to work with a cursor?

DECLARE, OPEN, FETCH in a loop, CLOSE, and DEALLOCATE.

Q 51 : When using the FAST_FORWARD option in the cursor declaration command,
what does it mean regarding the cursor properties?

It means that the cursor is read-only, forward-only.

Q 52 : How would you determine whether SQL Server used the batch processing mode for a specific iterator?

You can check the iterator’s Actual Execution Mode property.

Q 53 : Would you prefer using plan guides instead of optimizer hints?

With plan guides, you do not need to change the query text.

Q 54 : Why relational model is called set based model ? 

Relational model means that it is based on the concepts of mathematical set theory. SQL queries that query on the SQL tables outputs the rows in the form of sets of rows.

Q 55 : Give example of iterative model? 

Iterative model means the same concept which is used in loop iteration in high level languages such as C or python . In the same way , the iterative model works on rows as they go row by row. They are by comparison slower in performance .Example : Cursors.

Q 56 : What does fast forward cursor means ?

Means that cursor will start from the initiating point to the last element and will not go backward.

Q 57 : What are scopes of temporary tables ?

There are two types of temporary tables in SQL :  local and global.

Local temp tables are visible to the level that created , across the all inner batches and to the all inner levels of call stack.

Global temp tables are where destroyed when the session that created it terminates or destroyed.

Table variables are named with @ sign : @TV1. They are only accessible to batch that created it .They are not visible to across batches at same level and not even to inner levels .

Q 58: What is the difference between temp table and table variable?

Temp tables are similar to regular database tables . Any data changes in temp tables during transaction can be rolled back  . But changes in table variable in a transaction can not be rolled back.

Another difference is performance wise , for temp tables SQL maintains histograms . Means we can see their statistics and work on them to improve performance for instance : we can create indexes on columns for filtering out the data properly.

In case of table variables , it performs the full table scan which decreases the performance.

Q 59 : What does SET NOCOUNT ON do ?

NOCOUNT  ON can remove the messages like : 2 rows affected returned .Putting
a SET NOCOUNT ON at the beginning of  stored procedure prevents  from returning that message to the client.

Q 60 : What does GOTO statement does ?

With GOTO you can jump to the particular label to from where you are at!

For instance :

PRINT ‘First’;
GOTO Label_1;
PRINT ‘Second’;
Label_1:
PRINT ‘End now’;

Q 61 : Can the stored procedure have multiple batches ?

No.

Q 62 : What does RETURN statement do ?

It exits the stored procedure and returns to caller statement or procedure .

Q 63: What does @@Rowcount does?

It counts the number of rows read or affected by the SQL statement .

Q 64: Can the AFTER triggers be nested ?

Yes they can be nested , means that trigger on a table T1 can have a trigger and that inserts rows into table T2 that is also having a trigger on it and so on .
The number of maximum nested triggers SQL can have is 32.

Q 65 : What does SCHEMABINDING statement do?

WITH SCHEMABINDING means that a schema object is dependent or have a some sort of bound with other object . Object can be a table ,view or procedure. For instance : You cannot change the table structure if it is schemabinded with another view until you drop that view.

Q 66 : Can we use multiple select statements in a view ?

No . Only you can use one select statement as it is required that a view returns one result set.

But , you can use union statement with two or more select statements as the final result would be one result set.

Q 67 : Can data be modified in a table with a view ? If yes then what precautions are there?

Yes we can modify data in a table through a view instead of directly modifying through the table.

There are few precautions and restriction that should be taken carefully :

  • The DML statement must use or point to one table only even if the view is made up of or referencing multiple tables.
  • The view column to be modified should not have a aggregate function on it in the table  or a view whose column is resulted from GROUP BY , DISTINCT or HAVING .
  • We cannot modify a view if it is having TOP or offset fetch with the WITH CHECK OPTION.
  • The data in the view column cannot be modified if it is made up from Union, union All , intersect or cross joins.

Q 68 : What are partitioned views?

We can partition large table in SQL with help of views on  one or across several servers. Simply you can use union on partitioned tables and create a view for it. It is called partitioned view. If the table is spread across multiple SQL instances then it is called distributed partitioned view.

Q 69 : What is inline table valued function ? 

It is also a type of view called parameterized view .  The difference is that only it can take parameter to filter the rows from the view.

Q 70 : What is identity?

It is a type of property for a column having a type numeric. We typically use the identity column to generate surrogate keys which are mostly generated by system automatically when you insert the data. It has two values in it . First is seed value which is the first value and second is step value which is incremental value. We define both of these values at definition.

Q 71 : What is SET IDENTITY_INSERT?

We can specify our own values for identity column for insert by SET IDENTITY_INSERT = ON .  But we cannot update identity column value.

Q 72 : How can we find last Identity value?

The SCOPE_IDENTITY returns last identity value which is generated in the session in the current scope (batch, procedure or function) .

The @@IDENTITY returns the last identity value generated in the session inspite what is the scope.

The IDENT_CURRENT takes table as input and returns the last identity value being generated in the given table .

Q 73 : What will happen if we use Scope_Identity , @@Identity and Ident_current in different sessions?

Scope_Identity and @@Identity will return NULL  but Ident_current will return the last value of the identity column whatever the session is.

Q 74 : What is sequence ?

Sequence is an independent object in SQL Server . It is quite like the identity column .

All numeric types are accepted by sequence as like the identity does. But By default is BIGINT.

There are number of properties which identity does not have :

INCREMENT BY : It Increments the value. The default value is 1.

MINVALUE : The minimum value for the type .

MAXVALUE :The maximum value to be given for the type .

CYCLE | NO CYCLE :  It deals if  to allow the sequence to cycle or not. The default value is NO CYCLE.

START WITH : It is the sequence start value.

Q 75 : How can we request next Value in sequence  ?

To request new value from the sequence, run the following code .
SELECT NEXT VALUE FOR tableName;

Q 76 : Can we change the datatype of sequence ? And can we change properties and values ?

No we cannot change the datatype but yes we can change properties and values.

Q 77 : What is cache in sequence ?

This property means writing the sequence value to disk . For instance a CACHE with value 100 means it will write to disk after every 100 . Performance wise using NO CACHE has to write to disk each request of new sequence value. With CACHING performance is good.

Q78 : What is APPLY operator ? 

The APPLY operator works on two input tables in which the second can be a table expression.
And we will refer to them as the “left” and “right” tables and the right table is usually a derived table or an inline table valued function.

The APPLY operator applies the right table expression to each row from the left table and produces a result table with the unified result sets.

We will discuss this in separate post.

Q78 : What is the difference between Cross and outer apply?

The APPLY operator has two types;

CROSS APPLY doesn’t return left rows that get an empty set back from the right side.

The OUTER APPLY preserves the left side, and therefore, does return left rows when the right side returns an empty set. NULLs are used as placeholders from the right side in the outer rows if they are empty.

Q79 : What is Transaction ?

Transaction is a unit of work that has many activities like querying a data and changing multiple data definition statement or queries.

Q80 : What is implicit_transactions? Statement ?

This statement is OFF by default in SQL  . If you do not start your transaction with BEGIN statement it is fine but you have to specify COMMIT OR ROLLBACK TRAN to end the transaction.

Q 81 : Define 4 properties of a transaction ?

Atomicity:  Either all changes in the transaction takes place or None.  If a system fails before a COMMIT in a transaction or there is an error , the transaction is being rolled back.

Consistency : It is the state of data the database gives access to the user .The isolation level is also a part of consistency. Consistency also refers to the integrity rules it follows like (Primary Key , Foreign Key and Unique constraints ) etc .

Isolation :  Isolation level means to handle and control the access level of data and ensure data is in the level of consistency . There are two types of isolation levels : Locking and row versioning .

Durability : It means that the data is durable  . Whenever , the data is changed it is first logged into the Logs of the database before it can be written to disk . Once the data is Committed and written to logs it is considered safe even the system fails.

Q 82 : What is redo and undo ?

If the system fails , when it gets started it checks the logs and replays the data and checks for committed data , if its committed it rolls forward and Redo .

If the data is uncommitted it gets rolled back to previous state once the system is up .

Q 83 : What is a lock ?

Lock is a control resource obtained by the transaction to protect data and prevent data changes or access by another transaction .

Q 84 : What is an Exclusive Lock ? 

When we want to modify a data , the transaction requests an exclusive lock on the data and its resources. If it acquires the exclusive lock it will not allow any other transaction to access or modify data until that transaction releases the lock .

In single statement transaction , the lock is held until the statement completes.

In multi statement transaction , the lock is held until all the statements are executed and the transaction is ended by Commit tran or rollback Tran.

If a transaction is holding any type of lock on a resource then it can not acquire Exclusive lock and no lock can acquire any resource if a transaction is having an Exclusive lock.

Q 85 : What is shared lock ?

When a transaction reads a data , the transaction acquires Shared lock on it and its resources . Multiple transactions can acquire shared locks simultaneously on a same resource.

Q 86 : What is row versioning ? What is READ COMMITTED SNAPSHOT Isolation level ?

In Azure SQL database Read Committed Snapshot is the default isolation level.

Instead of Locking technology , this isolation level works on row versioning , so the transaction does not wait for acquiring shared lock for reading .

In this READ COMMITTED SNAPSHOT isolation level , if the transaction modifies a row and another transaction tries to read the row , it will read the last committed state of the row that was available before the start of the statement (optimistic Concurrency) .

This case is very different in READ COMMITTED isolation level , if the transaction is modifying the row , another transaction cannot read the same rows of data until the first transaction completes (Pessimistic Concurrency ) .

Q 87 : What are lockable resources ? 

Lockable resources are  : RID, rows , tables , pages and database tables.

Q 88 : What is higher level of Granularity ? 

To obtain  lock on a resource , your transaction must first obtain intent locks of the same mode on higher levels of granularity . For example, to get an exclusive lock on a row, your transaction must first acquire an intent exclusive lock on the page where the row resides and an intent exclusive lock on the object that owns the page.

To get a shared lock on a certain level of granularity, your transaction first needs to acquire intent shared locks on higher levels of granularity.

The purpose of intent locks is to efficiently detect incompatible lock requests on higher levels of granularity and prevent the granting of those.

For instance , if a transaction holds a lock on a row and another asks for an incompatible lock mode on the whole page(higher level) or table where that row resides, it is easy for SQL Server to identify the conflict because of the intent locks that the first transaction acquired on the page and table. Intent locks do not interfere with requests for locks on lower levels of granularity. For example, an intent lock on a page doesn’t prevent other transactions from acquiring incompatible lock modes on rows within the page.

Q 89 : What is Blocking ?

When one transaction holds a data resource and another requests for the same resource , so requester gets blocked and enters into the wait state.

Q 90 : What is isolation level  READ UNCOMMITTED ?

It the lowest available isolation level. In this isolation level, a reader doesn’t
ask for a shared lock. A reader that doesn’t ask for a shared lock can never be in conflict with a writer that is holding an exclusive lock. This means that the reader can read uncommitted changes (also known as dirty reads). It also means that the reader won’t interfere with a writer that asks for an exclusive lock. In other words, a writer can change data while a reader that is running under the READ UNCOMMITTED isolation level reads data.

Q 91 : What is the isolation level READ COMMITED ? 

It does not allow dirty reads or uncommitted data . It allows the reader to acquire shared lock to prevent reading the uncommitted data.

This means that if a writer is holding an exclusive lock, the reader’s shared lock request will be in conflict with the writer, and it has to wait.

As soon as the writer commits the transaction, the reader can get its shared lock, but what it reads are necessarily only committed changes.

Q 92 : What is the isolation level REPEATABLE READ ?

It not only does a reader need a shared lock to be able to read, but it also holds the lock until the end of the transaction. This means that as soon as the reader has acquired a shared lock on a data resource to read it, no one can obtain an exclusive lock to modify that resource until the reader ends the transaction.

If a transaction is having a shared lock and it is not committed and it is in repeatable read isolation level, then it will not allow any other transaction to update the data of the first transaction until the first transaction is not committed.

Q 93 :  What is isolation level SERIALIZABLE ?

One problem in REPEATABLE READ is that what if a another transaction enters a new row or rows into the table ? this phenomenon is called Phantom Reads . To overcome this problem , we use the SERIALIZABLE . It blocks the other transaction to enter the new rows.

Database Modeling

1- Entity :  Entity in database can be said like an object in object oriented program. Here we will take example of movie database . In E-R relationship diagrams they are represented by rectangles.

In the movie database the entities are  : movie, stars and studios.

2 – Attributes :  They are like properties of an entity .  For example movie entity  can have attributes like : Title, length.  They are represented by ovals.

3- Relationships :  The two entities movie and stars can have relationship or can have a connection with each other. The relationship between two entities can be represented by diamond.

Er1

Note here , the arrow towards the studios represents that studio owns a movie .

1 – One to One Relationship : Remember that the arrow means “at most one” .

er2

Here a president can run only one studio and a studio has only one president, so this relationship is one-one, as indicated by the two arrows, one entering each entity set.

2 – Many to one Relationships : 

er3

We have an arrow pointing to entity set Studios, indicating that for a particular star and movie, there is only one studio with which the star has contracted for that movie. However, there are no arrows pointing to entity sets Stars or Movies. A studio may contract with several stars for a movie, and a star may contract with one studio for more than one movie.

Another relationship can be is a relationship Sequel-of between the entity set Movies and itself. Each relationship is between two movies, one of which is the sequel of the other. To differentiate the two movies in a relationship, one line is labeled by the role Original and one by the role Sequel, indicating the original movie and its sequel, respectively. We assume that a movie may have many sequels, but for each sequel there is only one original movie.

er4

 

Attributes on Relationships

Sometimes it is conwmient. or even essential, to associate attributes with a
relationship, rather than with any one of the entity sets that the relationship
connects. For example. consider the relationship of Fig. 4.4, which represents
contracts between a star and ~tudio for a rnovie. 1 We might wish to record the
salary associated with this contract. However, we cannot associate it with the
star; a star might get different salaries for different movies. Similarly, it does
not make sense to associate the salary with a studio (they may pay different
salaries to different stars) or with a movie (different stars in a movie may receive
different salaries).

However, we can associate a unique salary with the (star, movie, studio)
triple in the relationship set for the Contracts relationship .

Salaries becomes the fourth entity set of relationship Contracts. The whole diagram is shown in Fig.

Notice that there is an arrow into the Salaries entity set in Fig. 4.8. That
arrow is appropriate, since we know that the salary is determined by all the other
entity sets involved in the relationship. In general, when we do a conversion
from attributes on a relationship to an additional entity set, we place an arrow
into that entity set.

er5

 

Subclasses in the E/R Model :

Often, an entity set contains certain entities that have special properties not associated with all members of the set. If so, we find it useful to define certain special-case entity sets, or subclasses, each with its own special attributes and/or relationships. We connect an entity set to its subclasses using a relationship called isa (i.e., “an A is a B” expresses an “isa” relationship from entity set A to entity set B).

An isa relationship is a special kind of relationship, and to emphasize that
it is unlike other relationships, we use a special notation: a triangle. One side
of the triangle is attached to the subclass, and the opposite point is connected
to the superclass. Every isa relationship is one-one, although we shall not draw
the two arrows that are associated with other one-one relationships.

er6

 

Weak Entity Sets :

It is possible for an entity set’s key to be composed of attributes, some or all
of which belong to another entity set. Such an entity set is called a weak entity
set.

Causes of Weak Entity Sets : There are two principal reasons we need weak entity sets. First, sometimes entity sets fall into a hierarchy based on classifications unrelated to the “isa hierarchy” .

Exainple 4.20: A movie studio might have several film crews. The crews
might be designated by a given studio as crew 1, crew 2, and so on. However,
other studios might use the same designations for crews, so the attribute number
is not a key for crews. Rather, to name a crew uniquely, we need to give
both the name of the studio to which it belongs and the number of the crew.
The situation is suggested by Fig. 4.20. The double-rectangle indicates a weak
entity set, and the double-diamond indicates a many-one relationship that helps
provide the key for the weak entity set. The notation will be explained further
in Section 4.4.3. The key for weak entity set Crews is its own number attribute
and the name attribute of the unique studio to which the crew is related by the
many-one Unit-of relationship.

 

er7

 

Two or more genera may have species with the same species name. Thus, to
designate a species uniquely we need both the species name and the name of the
genus to which the species is related by the Belongs-to relationship, as suggested
in Fig. 4.21. Species is a weak entity set whose key comes partially from its
genus.

er8

 

Associations :

A binary relationship between classes is called an association. There is no analog of multiway relationships in UML. Rather, a multi way relationship has to be broken into binary relationships, which as we suggested in Section 4.1.10, can always be done. The interpretation of an association is exactly what we described for relationships in Section 4.1.5 on relationship sets. The association is a set of pairs of objects, one from each of the classes it connects.

er9

 

 

Example 4.37: The UML diagram of Fig. 4.37 is intended to mirror the E/R diagram of Fig. 4.18. Here, we see assumptions somewhat different from those in Example 4.36, about the numbers of movies and studios that can be associated. The label 1..* at the Movies end of Owns says that each studio must own at least one movie (or else it isn’t really a studio). There is still no upper limit on how many movies a studio can own.
At the Studios end of Owns, we see the label 1..1. That label says that a movie must be owned by one studio and only one studio. It is not possible for a movie not to be owned by any studio, as was possible in Fig. 4.36. The label 1..1 says exactly what the rounded arrow in E/R diagrams says.
We also see the association Runs between studios and presidents. At the Studios end we see label 1..1. That is, a president must be the president of one and only one studio. That label reflects the same constraint as the rounded arrow from Presidents to Studios in Fig. 4.18. At the other end of association Runs is the label 0 .. 1. That label says that a studio can have at most one president, but it could not have a president at some time. This constraint is exactly the constraint of a pointed arrow.

er10

 

Self-Associations:

An association can have both ends at the same class; such an association is
called a self-association .

Example 4.38: Figure 4.38 represents the relationship “sequel-of’ on movies. We see one association with each end at the class Movies. The end with role TheOriginal points to the original movie, and it has label 0 .. 1. That is, for a movie to be a sequel, there has to be exactly one movie that was the original. However, some movies are not sequels of any movie. The other role, TheSequel has label 0 .. *. The reasoning is that an original can have any number of sequels. Note we take the point of view that there is an original movie for any sequence of sequels, and a sequel is a sequel of the original, not of the previous movie in the sequence. For instance, Rocky II through Rocky V are sequels of Rocky. We do not assume Rocky IV is a sequel of Rocky III, and so on.

er11

 

Aggregations and Compositions :

An aggregation is a line between two classes that ends in an open diamond at one end. The implication of the diamond is that the label at that end must be 0 .. 1, i.e., the aggregation is a many-one association from the class at the opposite end to the class at the diamond end. Although the aggregation is an association, we do not need to name it, since in practice that name will never be used in a relational implementation.

A composition is similar to an association, but the label at the diamond end must be 1..1. That is, every object at the opposite end from the diamond must be connected to exactly one object at the diamond end. Compositions are distinguished by making the diamond be solid black.er13

 

Link for more example on aggregation and composition.

https://www.visual-paradigm.com/guide/uml-unified-modeling-language/uml-aggregation-vs-composition/

 

Cardinality : 

The cardinality of the relationship describes the number of tuples (rows) on each side of the relationship.
Either side of the relationship may be restricted to allow zero, one, or multiple tuples.
The type of key enforces the restriction of multiple tuples. Primary keys are by definition unique and enforce the single-tuple restriction, whereas foreign keys permit multiple tuples.

card1

 

One-to-many pattern : By far the most common relationship is a one-to-many relationship; this is the classic parent-child relationship. Several tuples (rows) in the secondary entity relate to a single tuple in the primary entity. The relationship is between the primary entity’s primary key and the secondary entity’s foreign key .

For instance : each base camp may have several tours that originate from it. Each tour may originate from only one base camp, so the relationship is modeled as one base camp relating to multiple tours. The relationship is made between the BaseCamp’s primary key and the Tour entity’s BaseCampID foreign key, as diagrammed in Figure 3-5. Each Tour’s foreign key attribute contains a copy of its BaseCamp’s primary key.

The one-to-many relationship relates zero to many tuples (rows) in the secondary entity to a single tuple in the primary entity.

card2

 

One-to-one pattern :

One-to-one relationships connect two entities with primary keys at both entities. Because a primary key must be unique, each side of the relationship is restricted to one tuple .

 

Many-to-many pattern : 

In a many-to-many relationship, both sides may relate to multiple tuples (rows) on the other side of the relationship. The many-to-many relationship is common in reality, as shown in the following examples:

The many-to-many logical model shows multiple tuples on both ends of the relationship. Many-to-many relationships are nearly always optional. For example, the many customers-to-many events relationship is optional because the customer and the tour/event are each valid without the other .
To implement a many-to-many relationship in SQL DDL, a third table, called an associative table (sometimes called a junction table) is used, which artificially creates two one-to-many relationships between the two entities (see Figure 3-8).

Figure shows the associative entity with data to illustrate how it has a foreign key to each of the two many-to-many primary entities. This enables each primary entity to assume a one-to-many relationship with the other entity.

card3

In the associative entity (Customer_mm_Event), each customer can be represented multiple times, which creates an artificial one-event-to-many-customers relationship. Likewise, each event can be listed multiple times in the associative entity, creating a one-customer-to-many-events relationship.

card4

 

 

Supertype/subtype pattern

One of my favorite design patterns, that I don’t see used often enough, is the supertype/subtype pattern.
It supports generalization, and I use it extensively in my designs. The supertype/subtype pattern is also perfectly suited to modeling an object-oriented design in a relational database.
The supertype/subtype relationship leverages the one-to-one relationship to connect one supertype entity with one or more subtype entities. This extends the supertype entity with what appears to be flexible attributes.
The textbook example is a database that needs to store multiple types of contacts. All contacts have basic contact data such as name, location, phone number, and so on. Some contacts are customers with customer attributes (credit limits, loyalty programs, etc.). Some contacts are vendors with vendor-specific data.

card5

 

 

Reference from First course in Database by J.D  Ullman .

Working with Indexing-Part 2

There are two major types of indexes which SQL servers uses : clustered and Non clustered Indexes.

Before diving into the indexes ,we need to understand the internal organization and data structures used for indexing.

The SQL server stores the data in the memory called pages. Each page is of 8KB unit size and page is also the smallest unit for reading and writing operations of data.

 HEAPS and Balanced Trees:

Pages are physical level objects . SQL servers organizes the pages in logical level structures.

The table organized using the balanced tree data structure is called clustered index or clustered table.

And the table organized using the heaps data structure is called non clustered index .

Heaps

A heap is just a bunch of pages and extents which does not use logical structure. SQL Server keeps the track of which pages and extents belong to which object (table or index etc) through the page allocation system called Index Allocation Map (IAM) pages.

Every table or index has at least one IAM page, called first IAM.

A single IAM page can point to approximately 4 GB of space. Large objects can have more than one IAM page. IAM pages for an object are organized as a doubly linked list; each page has a pointer to its previous or next page.

SQL Server finds data in a heap only by scanning the entire heap and it uses IAM pages to scan heaps in physical order. Even if your query wants to retrieve only a single row, SQL Server has to scan the entire heap.

SQL Server stores new rows anywhere in a heap. It can store a new row in an existing page if the page is not full, or allocate a new page or extent for the object where you are inserting the new row. Of course, this means that heaps can become very fragmented after a period of time.

1- Clustered Index

There can be only one clustered index on a table because the data in a table can only be stored in only one order. The data is stored in order of their key values and if the table is not having the clustered index then it will be stored in non clustered index which stores rows in unsorted form called heap.

In clustered index the table is stored in balanced tree form . Every balanced tree of table has a one root page and at least one or more leaf pages and the data is stored in leaf levels.

The data is stored in order of the clustering key (a key can be a single column or multiple columns up to 16) and if it is more than a single column it is called composite key.

As told earlier , the clustered index uses logical level ordering . SQL still uses the IAM to go to the physical page and It can have zero or more intermediate level

Pages above leaf level point to leaf-level pages. A row in a page above leaf level contains
a clustering key and a pointer to a page where this value starts in logically ordered
leaf level.
If more than one page is needed to point to leaf-level pages, SQL Server creates the
first intermediate-level pages, which point to leaf-level pages. The root page rows point to intermediate-level pages. If the root page cannot point to all first-level intermediate pages, SQL Server creates a new intermediate level.

Pages on the same level are organized as a doubly linked list; therefore, SQL Server can find the previous and the next page in logical order for any specific page. In addition to balanced tree pages . Still , SQL Server uses IAM pages to track physical allocation of the balanced tree pages.

2- Non Clustered Index

We can have 999 non clustered indexes on a single table.

The structure of the non clustered index seems allmost same as clustered index. It has a root level and the intermediate level. Only the leaf level structure is different as it may or may not contain data and it depends whether table is structured as heap or balanced Tree .

The leaf level of a non clustered index contains the index keys and row locators and row locator points to actual row in the table. If the table is a heap, then the row locator is called row identifier (RID).

In order to seek for a row, SQL Server needs to traverse the index to the leaf level, and
then read the appropriate page from the heap and retrieve the row from the page. The operation of retrieving the row from the heap is called RID lookup.

If the table is organized as balanced tree, then the row locator is the clustering key which
means that when SQL seeks for a row, it has to first go to the levels on a non clustered index and then also all levels of a clustered index. This operation is called a key lookup.