Monday, September 13, 2010

T-SQL Tuesday #010: Little Known Index Facts

T-SQL TuesdayThis blog entry is participating in T-SQL Tuesday #010, hosted this month by Michael J. Swart.

You are invited to visit his blog and join the party and read more blogs participating in this month’s theme: Indexes.

I have a few interesting tidbits on indexes to talk about, but before I get on with my post, I wanted to address something that Michael brought up. He wanted to stir up a little controversy by asking the question: Are you Team Indexes or Team Indices?

Well, that’s a no-brainer for me… Indexes, of course!

Indices? Gimme a break! That’s Latin, not English!

And besides, I don’t like the inconsistency. Why do some purists insist that we have plural nouns like indices, vertices, matrices, and vortices, when we don’t have plurals like complices, duplices, reflices, or sices? If I need to blow my nose, do I ask someone for a few Kleenices?

NO!

In fact, why complicate our lifes unnecessarily with irregular plurals? A plural should just be formed by adding an -s or -es at the end of a noun. Period!

English is far less complicated than any other European language because of its lack of difficult verb conjugations, declensions and case markings, and gendered nouns. (All of this thanks to the Vikings invading England in the 8th and 9th centurys*). Well, I think the time has finally come again to simplify the language even more!

Are you with me?

Come on! Get up out of your chairs, ladys and gentlemans. Get on your foots right now, and grit your tooths, and go to the window and shout, “I’m mad as hell, and I’m not going to take it anymore!”

Alert the mediums! Let’s spread the word to all mans and womans and childs of the English-speaking world! Many persons will resist, but we must fight! We can make a difference that can greatly simplify the lifes of millions for milleniums to come!

Why do you add chlorine to your backyard pool? To prevent the growth of algas!

What do you order for dinner at an Italian restaurant? Spaghettos and Meatballs! Linguinas and Clams!

(What are the characters surrounding this sentence?) Parenthesises!

When do you---

Er… uh… ahem…

Sorry… I guess I got carried away. Forgive me.

Ahem.

Okay, now that I got that out of my system, let’s talk about a few little known interesting facts regarding indexes (not indices!) in our datumsbases.



FILLed With Joy

As you may know, when you create an index, you can specify a fill factor value, which designates the percentage of space to be filled on each leaf page of the index as it’s created. So, for example, specifying a fill factor of 80 will fill each leaf page up to 80% capacity, thereby leaving 20% of empty space for future data to be added:

create index IX_CarrierTrackingNumber
on AdventureWorks.Sales.SalesOrderDetail (CarrierTrackingNumber)
with (fillfactor=80)
A fill factor of 0 is exactly the same as specifying a value of 100, which indicates that the pages should be filled to full capacity.

What if you don’t specify a fill factor when you create an index? What is the default? According to dozens of books and blogs, and also according to the Books Online entry for CREATE INDEX, (I’m quoting here), “the default is 0”.

Terribly sorry… that’s not at all correct.

The real default fill factor value depends on the server. When you first install a server, its default fill factor is 0, but then it can be changed to any value you like. If you look at the Server Properties dialog, specifically in the Database Settings section, you can specify the server-wide default index fill factor there:

Default Fill Factor

You can also change the server-wide default value in code:

exec sys.sp_configure 'fill factor',90
go
reconfigure
go
Note that for any change to take effect (either through the dialog or through code), you have to restart the server.

So, if you want to play a trick on a fellow developer, wait until he steps away from his computer, and then change the default fill factor on his development server to 1% (and restart it). Then sit back and enjoy the hilarity that ensues when all the indexes he creates (without specifying a fill factor) are pretty much worthless.

That’s even more fun that changing his Batch Separator command from GO to SELECT. But remember, you didn’t hear any of that stuff here.

We Can REBUILD Him

Eventually over time, an index can become fragmented due to page splits that occur when new rows are added to full index pages. And so we can use ALTER INDEX to REBUILD it, and we can optionally specify a fill factor for that rebuilding if we wish:

alter index IX_CarrierTrackingNumber
on AdventureWorks.Sales.SalesOrderDetail
rebuild with (fillfactor=80)
So, once again, just like in CREATE INDEX, we can either specify a fill factor, or we can just let the default value come into play. Right?

Well… yes… kind of.

It’s just that the default fill factor for ALTER INDEX is not the same as the default for CREATE INDEX. Once again, according to the Books Online entry for ALTER INDEX, (I’m quoting here), “the default is 0”.

Wrong-o.

In this case, the default for rebuilding is simply going to be the same fill factor that was put into place when the index was originally CREATEd. In other words, it’s rebuilt using the same “recipe” for creation… the default is acquired kind of like this:

select fill_factor
from sys.indexes
where [object_id]=object_id('Sales.SalesOrderDetail')
and name='IX_CarrierTrackingNumber'
/*
fill_factor
-----------
80
*/
So let’s review…

When you CREATE or ALTER/REBUILD an index, you can use a default fill factor (albeit different “flavors” of default) or you can supply your own specific percent value.

Repeat after me: Default or percent. Default or percent. Default or percent.

Got it? Good.

So now that you have that drilled into your brain, you start deciding to look into Maintenance Plans, specifically the Rebuild Index Task of a Maintenance Plan. At first glance at the dialog window, you see what’s familiar to you already: Use the Default or specify a Percent:

Rebuild Index Dialog

You decide to set up this task to rebuild all your indexes with a fill factor of 80 and so you type the number 80 in the Percent box. And you set the task to run tonight at midnight.

The next morning you come in and everything has slowed to a crawl.

Why?

Look more closely at the dialog. It’s not asking for a Fill Factor Percent… it’s asking for a Free Space Percent! It’s the exact opposite of everything you’ve learned! You’ve just rebuilt all your indexes with 80% free space and only 20% filled… and therefore all your indexes are 4 times bigger than they should be because most of the space in their leaf pages is empty.

Don’t laugh. I saw this happen. I was called into a company that had poor performance in their system, and, thanks to my utility for generating DDL Hyperlinks, I happened to noticed that every one of their indexes had a fill factor of 20. Apparently an external contract DBA had come in to set up some admin-related tasks and he entered the number 80 in the Rebuild Index Task Dialog of a Maintenance Plan (presumably thinking he was entering a fill factor value), thereby bloating the database with tons of empty space.

So beware of that Rebuild Index Task Dialog. Whoever put that backwards dialog together must have been evil. Microsoft should check their payroll for a person whose name is Lucifer or Beelzebub or Satan.

CLUSTER’s Last Stand

Now on to something interesting about Clustered Index Keys…

Let’s say you have a table with a Clustered Index. The leaf nodes of the index contain the data of the table itself. Now you create a Nonclustered Index on the table. As you may know, the column(s) that make up the Clustered Index Key are incorporated into the Nonclustered Index so that the key can be used as a pointer back to the base table data (via a Key Lookup).

But how the Clustered Index Key is incorporated depends on how you define the index.

Let’s look at an extreme example. The following code will create a 200,000-row table with a Clustered Index on 16 columns (the maximum allowed) of 56 bytes each (bringing us close to the maximum key size allowed of 900 bytes).

use TempDB
go

/* Create the table */
create table Obnoxious
(
A char(56) not null
,B char(56) not null
,C char(56) not null
,D char(56) not null
,E char(56) not null
,F char(56) not null
,G char(56) not null
,H char(56) not null
,I char(56) not null
,J char(56) not null
,K char(56) not null
,L char(56) not null
,M char(56) not null
,N char(56) not null
,O char(56) not null
,P char(56) not null
,X int not null
)
go

/* Populate it with 200,000 rows */
with CTE as
(
select RowNum=row_number() over (order by (select 0))
from sys.objects a, sys.objects b, sys.objects c
)
insert Obnoxious
select newid(),newid(),newid(),newid(),newid(),newid(),newid(),newid()
,newid(),newid(),newid(),newid(),newid(),newid(),newid(),newid()
,RowNum
from CTE
where RowNum<=200000
go

/* Create a Clustered Index on the character columns */
create unique clustered index CIX_ABCDEFGHIJKLMNOP
on Obnoxious (A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P)
go
Great. Now let’s create a Nonclustered Index on that little integer column called X:

create nonclustered index IX_X
on Obnoxious (X)
That command doesn’t simply create the index with only the column X. It really creates an index on X plus all the keys making up the Clustered Index. So in reality, it’s equivalent to doing this (note, though, that you cannot actually execute this because it specifies more than the 16 allowable maximum columns):

create nonclustered index IX_X
on Obnoxious (X,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P)
Those 16 huge character keys totaling 16*56=896 bytes are incorporated into the index in both the intermediate levels and the leaf levels.

On the other hand, if you create a UNIQUE index…

create UNIQUE nonclustered index UQ_X
on Obnoxious (X)
…then the keys making up the Clustered Index only manifest themselves at the leaf levels. So in reality, it’s the same as doing this:

create UNIQUE nonclustered index UQ_X
on Obnoxious (X)
include
(A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P)
To verify this, we can use the sys.dm_db_index_physical_stats DMV to get a look at the physical structure of the 3 indexes we created. (Note that I would ordinarily accomplish this via a CROSS APPLY rather than a loop, but that particular DMV will not allow it… Read this post by Paul Randal for an explanation as to why):

declare @i int = 0
while @i<3
begin
set @i=@i+1
select i.name
,s.index_depth
,[level]=s.index_level
,s.page_count
,s.record_count
,s.avg_record_size_in_bytes
from sys.dm_db_index_physical_stats(db_id()
,object_id('Obnoxious')
,@i
,null
,'detailed') s
cross join sys.indexes i
where i.[object_id]=object_id('Obnoxious')
and i.index_id=@i
end
/*
name index_depth level page_count record_count avg_record_size_in_bytes
-------------------- ----------- ----- ---------- ------------ ------------------------
CIX_ABCDEFGHIJKLMNOP 6 0 25002 200000 909
CIX_ABCDEFGHIJKLMNOP 6 1 3128 25002 903
CIX_ABCDEFGHIJKLMNOP 6 2 392 3128 903
CIX_ABCDEFGHIJKLMNOP 6 3 52 392 903
CIX_ABCDEFGHIJKLMNOP 6 4 8 52 903
CIX_ABCDEFGHIJKLMNOP 6 5 1 8 903

name index_depth level page_count record_count avg_record_size_in_bytes
-------------------- ----------- ----- ---------- ------------ ------------------------
IX_X 6 0 25001 200000 901
IX_X 6 1 3126 25001 907
IX_X 6 2 393 3126 907
IX_X 6 3 51 393 907
IX_X 6 4 8 51 907
IX_X 6 5 1 8 907

name index_depth level page_count record_count avg_record_size_in_bytes
-------------------- ----------- ----- ---------- ------------ ------------------------
UQ_X 3 0 25000 200000 901
UQ_X 3 1 44 25000 11
UQ_X 3 2 1 44 11
*/
Notice that the (non-unique) Nonclustered Index IX_X has 6 levels in its b-tree and the intermediate nodes have an average record size of 907 bytes and it takes up a total of 28580 pages. On the other hand, the Unique Nonclustered Index UQ_X only has 3 levels and its intermediate nodes have an average record size of only 11 bytes and it is over 3500 pages smaller.

Again, this is an extreme example, but it does give you a little extra knowledge about the fact that unique indexes can be smaller than their non-unique cousins… especially when the Clustered Key is a really fat one. This is yet another argument in favor of creating unique indexes if you have the opportunity. As Rob Farley demonstrated in Chapter 40 of MVP Deep Dives, unique indexes have some nice hidden benefits that you may not realize.

Note that with a multiple-column Clustered Index Key, the engine is “smart enough” to not duplicate the key columns of the index if you specify them as part of the index definition. In other words, if we create a Nonclustered Index (both non-unique and unique for demonstration purposes) on the integer column X plus a few of the character columns…

create nonclustered index IX_XMEN
on Obnoxious (X,M,E,N)
go
create
UNIQUE nonclustered index UQ_XMEN
on Obnoxious (X,M,E,N)
go
They would be equivalent to doing the following:

create nonclustered index IX_XMEN
on Obnoxious (X,M,E,N,A,B,C,D,F,G,H,I,J,K,L,O,P)
go
create
UNIQUE nonclustered index UQ_XMEN
on Obnoxious (X,M,E,N)
include (A,B,C,D,F,G,H,I,J,K,L,O,P)
go
Note how the columns M, E, and N are not duplicated… only the remaining columns of the Clustered Index that we did not specify are tacked on.

And thus ends our talk today on Little Known Facts about Indexes (NOT Indices!).



*If you’re curious about the Viking invasions of England and how they greatly affected the English language (and other interesting information about how the language evolved), I highly recommend reading Our Magnificent Bastard Tongue: The Untold History of English by John McWhorter. He has a fun tongue-in-cheek style of presenting the material. Another very entertaining read is Bill Bryson's The Mother Tongue: English and How It Got That Way. Enjoy!