There are two major types of indexes which SQL servers uses : clustered and Non clustered Indexes.
Before diving into the indexes ,we need to understand the internal organization and data structures used for indexing.
The SQL server stores the data in the memory called pages. Each page is of 8KB unit size and page is also the smallest unit for reading and writing operations of data.
HEAPS and Balanced Trees:
Pages are physical level objects . SQL servers organizes the pages in logical level structures.
The table organized using the balanced tree data structure is called clustered index or clustered table.
And the table organized using the heaps data structure is called non clustered index .
Heaps
A heap is just a bunch of pages and extents which does not use logical structure. SQL Server keeps the track of which pages and extents belong to which object (table or index etc) through the page allocation system called Index Allocation Map (IAM) pages.
Every table or index has at least one IAM page, called first IAM.
A single IAM page can point to approximately 4 GB of space. Large objects can have more than one IAM page. IAM pages for an object are organized as a doubly linked list; each page has a pointer to its previous or next page.
SQL Server finds data in a heap only by scanning the entire heap and it uses IAM pages to scan heaps in physical order. Even if your query wants to retrieve only a single row, SQL Server has to scan the entire heap.
SQL Server stores new rows anywhere in a heap. It can store a new row in an existing page if the page is not full, or allocate a new page or extent for the object where you are inserting the new row. Of course, this means that heaps can become very fragmented after a period of time.
1- Clustered Index
There can be only one clustered index on a table because the data in a table can only be stored in only one order. The data is stored in order of their key values and if the table is not having the clustered index then it will be stored in non clustered index which stores rows in unsorted form called heap.
In clustered index the table is stored in balanced tree form . Every balanced tree of table has a one root page and at least one or more leaf pages and the data is stored in leaf levels.
The data is stored in order of the clustering key (a key can be a single column or multiple columns up to 16) and if it is more than a single column it is called composite key.
As told earlier , the clustered index uses logical level ordering . SQL still uses the IAM to go to the physical page and It can have zero or more intermediate level
Pages above leaf level point to leaf-level pages. A row in a page above leaf level contains
a clustering key and a pointer to a page where this value starts in logically ordered
leaf level.
If more than one page is needed to point to leaf-level pages, SQL Server creates the
first intermediate-level pages, which point to leaf-level pages. The root page rows point to intermediate-level pages. If the root page cannot point to all first-level intermediate pages, SQL Server creates a new intermediate level.
Pages on the same level are organized as a doubly linked list; therefore, SQL Server can find the previous and the next page in logical order for any specific page. In addition to balanced tree pages . Still , SQL Server uses IAM pages to track physical allocation of the balanced tree pages.
2- Non Clustered Index
We can have 999 non clustered indexes on a single table.
The structure of the non clustered index seems allmost same as clustered index. It has a root level and the intermediate level. Only the leaf level structure is different as it may or may not contain data and it depends whether table is structured as heap or balanced Tree .
The leaf level of a non clustered index contains the index keys and row locators and row locator points to actual row in the table. If the table is a heap, then the row locator is called row identifier (RID).
In order to seek for a row, SQL Server needs to traverse the index to the leaf level, and
then read the appropriate page from the heap and retrieve the row from the page. The operation of retrieving the row from the heap is called RID lookup.
If the table is organized as balanced tree, then the row locator is the clustering key which
means that when SQL seeks for a row, it has to first go to the levels on a non clustered index and then also all levels of a clustered index. This operation is called a key lookup.
