Monday, April 26, 2010

Looking Under The Hood

Looking Under The HoodI came across a couple of incidents in the last few weeks at the MSDN T-SQL Forum that are good illustrations of why it’s important to look under the hood and investigate query plans and see what’s really going on with the optimizer.

Both of these incidents involve the way that the optimizer translates certain language elements.

The first of these, involving the COALESCE() function, is fairly straightforward, and it has been talked about in other blogs.

But the behavior in the second incident, involving multiple NOT EXISTS predicates, came as a big surprise to me and others, and it uncovered other unexpected phenomena.

COALESCE() ==> CASE Expression

In a message in the forum a few weeks ago, someone wanted to demonstrate how to calculate running totals. Here’s an example of the type of query he posted, calculating running totals of the OrderQty column by SalesOrderID in the SalesOrderDetail table in AdventureWorks:

select SalesOrderID
,SalesOrderDetailID
,ProductID
,OrderQty
,QtyRunTot=OrderQty
+coalesce((select sum(OrderQty)
from Sales.SalesOrderDetail
where SalesOrderID=sod.SalesOrderID
and SalesOrderDetailID<sod.SalesOrderDetailID)
,0)
from Sales.SalesOrderDetail sod
So, for each row, the running total is calculated by taking the current row’s OrderQty amount and adding it to the result of a scalar subquery which totals the OrderQty values of all previous rows for that order (when sorted by SalesOrderDetailID). If the row happens to actually be the very first row of the order, then performing the SUM() in the subquery would produce a NULL value (because there were no previous rows), and therefore he used COALESCE() to ensure that a zero non-NULL value would be produced.

The logic is sound; however, if you ever looked under the hood and investigated any query plan involving COALESCE(), you would notice that the optimizer simply translates it into a CASE Expression. For example, this simple COALESCE() query…

select ProductID
,Color=coalesce(Color,'None')
from Production.Product
…is translated by the optimizer into the following:

select ProductID
,Color=case when Color is not null then Color else 'None' end
from
Production.Product
So what does that mean for our running totals query? It’s translated into this:

select SalesOrderID
,SalesOrderDetailID
,ProductID
,OrderQty
,QtyRunTot=OrderQty
+case
when (select sum(OrderQty)
from Sales.SalesOrderDetail
where SalesOrderID=sod.SalesOrderID
and SalesOrderDetailID<sod.SalesOrderDetailID) is not null
then (select sum(OrderQty)
from Sales.SalesOrderDetail
where SalesOrderID=sod.SalesOrderID
and SalesOrderDetailID<sod.SalesOrderDetailID)
else 0
end
from
Sales.SalesOrderDetail sod
And that means that, for each row, a subquery is going to be run to check for NULL-ness, and then, if it doesn’t produce a NULL value, then the same subquery is going to be run again to actually produce the mathematical SUM. Take a look at the actual execution plan for the COALESCE() query (click on the picture for to see an enlarged image):

Plan for COALESCE() on a Scalar Subquery

For each of the 121,317 Rows scanned in #6, the SUM() subquery is calculated by operators #7, #8, and #9. If that value is not NULL, then the SUM() subquery is calculated again by operators #10, #11, and #12. You can see that the SEEK operator #9 is executed 121,317 times… once for each row in the SalesOrderDetail table. The SEEK operator #12 is executed 89,852 times… that comes out to 121317 - 89852 = 31465 times that it was NOT executed because #7, #8, #9 returned a NULL value. Only the first line item in an order would return a NULL value, and, sure enough, there are 31465 orders in the system (and therefore 31465 “first line items”).

But what a waste of resources, repeating the SEEKing and aggregation for the majority of the line items! And the really ironic part about this whole thing is what is behind the Compute Scalar operators #7 and #10. If you look at them, they are actually calculating this:

case when count(*)=0 then null else sum(OrderQty) end
Yep, if no line items were processed by the Stream Aggregate operators (#8 and #11), then the Compute Scalar operators #7 and #10 purposely set the SUM() equal to NULL. And ironically, we have to check for that NULL and COALESCE() it back into a zero. It’s like we’re working at cross purposes.

Anyway, we can eliminate the repetition and waste in the query plan in a couple of ways. The easiest way is to use T-SQL’s proprietary ISNULL() function instead of (the ANSI-standard function) COALESCE():

select SalesOrderID
,SalesOrderDetailID
,ProductID
,OrderQty
,QtyRunTot=OrderQty
+isnull((select sum(OrderQty)
from Sales.SalesOrderDetail
where SalesOrderID=sod.SalesOrderID
and SalesOrderDetailID<sod.SalesOrderDetailID)
,0)
from Sales.SalesOrderDetail sod
Or you could re-write the query so that the SUM() subquery is inclusive of the current row… Note how the subquery below now has a less-than-or-equals sign rather than a less-than sign… This way no checking for NULL is required because the subquery will always process at least one row and therefore return a non-NULL value:

select SalesOrderID
,SalesOrderDetailID
,ProductID
,OrderQty
,QtyRunTot=(select sum(OrderQty)
from Sales.SalesOrderDetail
where SalesOrderID=sod.SalesOrderID
and SalesOrderDetailID<=sod.SalesOrderDetailID)
from Sales.SalesOrderDetail sod
So now that you know how COALESCE() is translated behind the scenes, you know that it is most likely not a good idea to apply it to scalar subqueries.

NOT EXISTS ==> LEFT (ANTI SEMI) JOIN

Someone recently posted a message on the forum regarding multiple NOT EXISTS predicates in his WHERE clause. He noticed that if he moved a NOT EXISTS predicate from the beginning of the WHERE clause to the end of the clause (after other NOT EXISTS predicates), the query would improve its performance considerably.

This seemed strange. It shouldn’t matter what order you specify your conditions in a WHERE clause, because the optimizer should figure out the most cost-effective way to apply the various predicates.

Right?

Well… er… no.

What he said was absolutely true, which I can demonstrate here.

First, create 3 tables… a Customer table, and two child tables called CustHugeRowSize and CustTeenyRowSize. All three tables have clustered indexes with CustomerID as the primary column of the index:

if object_id('TempDB..#Customers','U') is not null drop table #Customers
go
create
table #Customers
(
CustomerID int primary key
,Name varchar(30)
)
go

if object_id('TempDB..#CustHugeRowSize','U') is not null drop table #CustHugeRowSize
go
create
table #CustHugeRowSize
(
CustomerID int
,SeqID int identity(1,1)
,Filler char(3500) default replicate('*',3500)
)
go
create
unique clustered index PK_CustHuge on #CustHugeRowSize(CustomerID,SeqID)
go

if object_id('TempDB..#CustTeenyRowSize','U') is not null drop table #CustTeenyRowSize
go
create
table #CustTeenyRowSize
(
CustomerID int
,SeqID int identity(1,1)
,Junk bit default 0
)
go
create
unique clustered index PK_CustTeeny on #CustTeenyRowSize(CustomerID,SeqID)
go
Now populate the Customer table with 100,000 rows, and populate the two child tables with 3 rows for each of those customers… with the exception of CustomerID 98765:

with 
L0
(c) as (select 0 union all select 0 union all select 0) --3 Rows
,L1(c) as (select 0 from L0 a cross join L0 b cross join L0 c) --27 Rows (3x3x3)
,L2(c) as (select 0 from L1 a cross join L1 b cross join L1 c) --19683 Rows (27x27x27)
,L3(c) as (select 0 from L2 a cross join L2 b) --387,420,489 Rows (19683x19683)
,NN(n) as (select row_number() over (order by (select 0)) from L3)
insert #Customers (CustomerID,Name)
select n,'Customer Number '+ltrim(str(n))
from NN
where n<=100000

insert #CustHugeRowSize (CustomerID)
select CustomerID
from #Customers
cross join (select 1 union all select 2 union all select 3) X(N)
where CustomerID<>98765

insert #CustTeenyRowSize (CustomerID)
select CustomerID
from #Customers
cross join (select 1 union all select 2 union all select 3) X(N)
where CustomerID<>98765
Make sure the statistics are completely up-to-date in both child tables:

update statistics #CustHugeRowSize with fullscan
update
statistics #CustTeenyRowSize with fullscan
And now let’s try two queries… both have two NOT EXISTS predicates, and the only difference between them is which NOT EXISTS predicate is specified first:

select *
from #Customers c
where not exists (select * from #CustHugeRowSize where CustomerID=c.CustomerID)
and not exists (select * from #CustTeenyRowSize where CustomerID=c.CustomerID)
/*
CustomerID Name
---------- ---------------------
98765 Customer Number 98765
*/


select *
from #Customers c
where not exists (select * from #CustTeenyRowSize where CustomerID=c.CustomerID)
and not exists (select * from #CustHugeRowSize where CustomerID=c.CustomerID)
/*
CustomerID Name
---------- ---------------------
98765 Customer Number 98765
*/

These two plans certainly produce the same result; however, as you can see in comparing their execution plans, the behavior (and performance) of the two queries is vastly different (click on the picture to see an enlarged image):

Multiple NOT EXISTS Plans

Did you see that comparison in cost? It’s a whopping 98% compared to 2%! And, sure enough, when you compare the two queries in the Profiler, you get these statistics:

/*
Description CPU Reads Duration
------------------------------------
Query #1 1265ms 151464 28509ms
Query #2 161ms 1160 447ms
*/
Wow! Why the huge difference?

As I’ve mentioned in a previous blog post, when we look under the hood, we can see that a NOT EXISTS predicate is translated into a LEFT ANTI SEMI JOIN, which is a LEFT JOIN that only outputs rows from the LEFT side (SEMI) if there are no matching rows (ANTI) in the right side.

In the first query, a MERGE JOIN is used between the Customers table and the CustHugeRowSize table to do the LEFT ANTI SEMI JOIN to find the rows in Customers that do NOT EXIST in the CustHugeRowSize table. That means the engine must SCAN both tables (or, more accurately, SCAN the clustered indexes of both tables). It takes a while to SCAN the entire CustHugeRowSize table, because it consists of 299,997 rows, each of which are over 3500 bytes in size… only two rows can fit on a 8192-byte page. So, the engine has to scan through 299997 / 2 = 149,999 (leaf) pages, which is over a gigabyte of data.

In the second query, on the other hand, the MERGE JOIN is done between the Customers table and the CustTeenyRowSize table. Even though the CustTeenyRowSize table also has 299,997 rows, it takes no time at all to SCAN it, because its row size is only 16 bytes, so over 500 rows can fit on a single data page, and so the engine only has to scan through approximately 299997 / 500 = 600 pages = 4.8 megabytes. That’s 250 times smaller than the CustHugeRowSize table.

Well, that’s very interesting, and it explains why the first query is so much slower, but why in the heck would the order of NOT EXISTS predicates, and therefore, the order of the LEFT JOINs, make a difference?

The answer is: “I don’t know”… it shouldn’t make any difference… especially when the ON condition of the multiple JOINs refer solely back to the table in the FROM clause and not to each other.

For example, take a look at the following 3 queries. They each have the same 3 LEFT JOINs, specified in different orders, with each JOIN referring back to the FROM table in the ON condition and not referring to each other at all:

select c.CustomerID
from Sales.Customer c
left join Sales.StoreContact sc on c.CustomerID=sc.CustomerID
left join Sales.SalesOrderHeader soh on c.CustomerID=soh.CustomerID
left join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID

select c.CustomerID
from Sales.Customer c
left join Sales.SalesOrderHeader soh on c.CustomerID=soh.CustomerID
left join Sales.StoreContact sc on c.CustomerID=sc.CustomerID
left join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID

select c.CustomerID
from Sales.Customer c
left join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID
left join Sales.SalesOrderHeader soh on c.CustomerID=soh.CustomerID
left join Sales.StoreContact sc on c.CustomerID=sc.CustomerID
The queries are functionally the same… they produce the same result… however, each of the queries produces a completely different query plan! In each case, the FROM table and the first specified LEFT JOIN table are always processed together in a Merge Join. And the costs of the three queries are different as well: 35% : 33% : 31%. And, running them in Profiler shows that the first query is, in fact, slowest and the third query is fastest:

/*
Description CPU Reads Duration
------------------------------------
Query #1 126ms 256 421ms
Query #2 114ms 253 351ms
Query #3 95ms 296 291ms
*/
It seems to me that the optimizer should figure out which is the best approach and choose the most cost-effective plan possible… in other words, it should ignore the order of my multiple outer joins and come up with the best, most cost-effective plan (i.e. the same plan) for each of those three queries.

I’ve inquired about this behavior with Microsoft but have not yet heard anything. Conor Cunningham did respond publicly in his blog last week, but, unless I misunderstood what he wrote, he indicates that the order of LEFT JOINs does not matter, which is the exact opposite of what I demonstrated above.

So, for the time being, it seems to me that if you have multiple NOT EXISTS predicates in your WHERE clause, or if you have multiple OUTER JOINs in a query, you may want to tinker with their order to make sure you come up with the most effective execution plan possible.

Monday, April 12, 2010

usp_DrawTree

Here is the source code for the usp_DrawTree procedure that was highlighted in my last blog post that was part of T-SQL Tuesday #005: Reporting.

You can also download this code from my SkyDrive.

Update on Jun13,2012: The code below has been updated since 2010. The code unfortunately made an assumption that the data in the temp table #TreeResult would be pulled out (via SELECT) in the same order it was INSERTed. That's incorrect. So I added an IDENTITY KEY to the table and an ORDER BY to the SELECT.

use TempDB /* Set to whatever database you wish */
go

if object_id('usp_DrawTree','P') is not null drop procedure usp_DrawTree
go

create procedure usp_DrawTree
@ID
int = null /* Desired ID value */
,@HangStyle bit = 0 /* Draw in Hanging Style? (vs. Left/Right Balanced) */
,@Direction char(1) = 'T' /* Branch Direction: R=Right L=Left M=Middle T=Top */
,@Level int = 1 /* Level to generate */
as

set nocount on

/* Make sure the Direction is valid */
if @Direction not in ('R','L','M','T')
begin
raiserror('Invalid @Direction... Must be R or L or M or T.',16,1)
return -1
end

/* Make sure the Level is valid */
if @Direction='T' set @Level=1
if @Level<1
begin
raiserror('Invalid @Level value.',16,1)
return -1
end

/*
If we are at Level 1 (the top Level), then...
Make sure our input table exists and that it has the required columns.
Also create some temp tables that will be needed as we build the tree.
*/
if @Level=1
begin
if object_id('tempdb..#TreeData','U') is null
begin
raiserror('You must have a #TreeData table prepared.',16,1)
return -1
end

declare @TreeDataColumns table
(
ColumnName nvarchar(128)
,DataType varchar(30)
)

insert @TreeDataColumns
select Name
,type_name(System_Type_ID)
from TempDB.sys.columns
where Object_ID=object_id('TempDB..#TreeData','U')

if not exists(select *
from @TreeDataColumns
where ColumnName='ID' and DataType='int')
begin
raiserror('#TreeData must contain an int column called ID.',16,1)
return -1
end
if not exists(select *
from @TreeDataColumns
where ColumnName='ParentID' and DataType='int')
begin
raiserror('#TreeData must contain an int column called ParentID.',16,1)
return -1
end
if not exists(select *
from @TreeDataColumns
where ColumnName='DataForBox')
begin
raiserror('#TreeData must contain a column called DataForBox.',16,1)
return -1
end
if not exists(select *
from @TreeDataColumns
where ColumnName='ExtraInfo')
begin
raiserror('#TreeData must contain a column called ExtraInfo.',16,1)
return -1
end
if not exists(select *
from @TreeDataColumns
where ColumnName='SortColumn')
begin
raiserror('#TreeData must contain a column called SortColumn.',16,1)
return -1
end

/* This table houses information about the branches at each level */
if object_id('TempDB..#BranchInfo','U') is not null drop table #BranchInfo
create table #BranchInfo
(
TreeLevel int primary key /* The Level 1..n */
,BoxWidth int /* The Width of the Box at that level */
,InProcess bit /* Is branch in process at that level? */
)
/* This table houses our final result that we will return to the user */
if object_id('TempDB..#TreeResult','U') is not null drop table #TreeResult
create table #TreeResult
(
TreeID int identity(1,1)
,TreeContent nvarchar(max)
)

end


/*
If the desired ID is null, then initialize it to whichever ID
in the table has a null ParentID, which will be the head honcho.
Note that TOP 1 is used just in case there is more than 1
row in the table with a ParentID equal to null
*/
if @ID is null
begin
set @ID=(select top 1 ID
from #TreeData
where ParentID is null)
end

/* Acquire the DataForBox and ExtraInfo columns into variables */
declare @DataForBox nvarchar(max)
,@ExtraInfo nvarchar(max)
select @DataForBox=cast(DataForBox as nvarchar(max))
,@ExtraInfo=cast(coalesce(ExtraInfo,'') as nvarchar(max))
from #TreeData
where ID=@ID

/* Define some more variables */
declare @MaxWidth int /* Maximum Width of Box */
,@NumBoxLines int /* Number of Lines of Data in Box */
,@PrimaryBoxLine int /* The Box Line from which we sprout a branch */
,@NumSubords int /* Quantity of Subordinates */
,@NumOnRight int /* Number of Subordinates on Right Side */
,@Counter int /* All-purpose counting variable */
,@Output nvarchar(max) /* String that we build for each line of output */
,@BoxData nvarchar(max) /* String containing line of data to put in Box */

declare @IDToProcess int /* The ID to process when we call recursively */
,@DirectionToProcess char(1) /* And the Direction to process */
,@LevelToProcess int /* And the Level to process */

/*
The @DataForBox variable contains the data we want to put in a box.
Optional vertical bars (|) can be used to indicate multiple lines to
display in the box.
The following @BoxLines table variable will split the @DataForBox
variable into rows.
*/
declare @BoxLines table
(
SeqNo int identity(1,1) primary key
,BoxData nvarchar(max)
)
insert @BoxLines
select XMLNode.value('(./text())[1]','nvarchar(max)')
from (select XMLString='<i>'+replace(@DataForBox,'|','</i><i>')+'</i>') X
cross apply (select XMLList=cast(XMLString as xml).query('.')) F1
cross apply XMLList.nodes('i') F2(XMLNode)

/*
What's the maximum width of a line in the box?
How many lines are there?
And which is the primary line (i.e. middle line) from which we sprout a child branch?
*/
select @MaxWidth=max(len(BoxData))
,@NumBoxLines=count(*)
,@PrimaryBoxLine=ceiling(1.0*count(*)/2)
from @BoxLines

/* Create an entry in #BranchInfo for the current Level if it doesn't exist */
if not exists (select * from #BranchInfo where TreeLevel=@Level)
insert #BranchInfo (TreeLevel) values (@Level)

/*
Populate #BranchInfo with information about the Level's BoxWidth
and the status of the Level in terms of Branches
*/
update #BranchInfo
set InProcess=case when @Direction in ('M','L') then 1 else 0 end
,BoxWidth=@MaxWidth+4+1 /* 4=LeftBorder+Space+Space+RightBorder, 1=BranchOutput */
where TreeLevel=@Level

/*
Collect all the subordinates of @ID into a table variable.
Order them by the SortColumn.
*/
declare @Subordinates table
(
SeqNo int identity(1,1) primary key
,ID int
)
insert @Subordinates
select ID
from #TreeData
where ParentID=@ID
order by SortColumn

/*
How many subordinates are there?
And How many on the right side?
*/
select @NumSubords=count(*)
,@NumOnRight=case
when @HangStyle=1
then 0
else count(*)/2
end
from
@Subordinates

/*
Before we draw our box, we must recursively process
all children on our right side
*/
set @Counter=0
while @Counter<@NumOnRight
begin
set @Counter=@Counter+1
select @IDToProcess=(select ID from @Subordinates where SeqNo=@Counter)
,@DirectionToProcess=case when @Counter=1 then 'R' else 'M' end
,@LevelToProcess=@Level+1
exec usp_DrawTree @IDToProcess
,@HangStyle
,@DirectionToProcess
,@LevelToProcess
end

/* Draw all branches in process for previous levels */
set @Output=''
select @Output=@Output+case
when InProcess=1
then N'│ '
else N' '
end
+space(BoxWidth)
from #BranchInfo
where TreeLevel<@Level

/*
And now draw...
1) Any in-process branch for our current level
2) The top border of our box
3) Any branch that might connect to right-hand children
*/
set @Output=@Output+case
when @Direction in ('L','M')
then N'│ '
else N' '
end
+N'┌'+replicate(N'─',@MaxWidth+2)+N'┐'
+case
when @NumSubords=0 then N' '
when @NumOnRight=0 then N' '
else N' │'
end

/* Save that to our result table */
insert #TreeResult (TreeContent)
select @Output

/*
Now it's time to output the data within the the box itself.
We will perform a loop for each line to be written in the box.
*/
set @Counter=0
while @Counter<@NumBoxLines
begin
set @Counter=@Counter+1

/* Bet the @BoxData for the line to be written */
select @BoxData=ltrim(rtrim(BoxData))
from @BoxLines
where SeqNo=@Counter

/* Pad it with spaces to center it in the box */
set @BoxData=space((@MaxWidth-len(@BoxData))/2)+@BoxData
set @BoxData=@BoxData+space(@MaxWidth-len(@BoxData))


/* Draw all branches in process for previous levels */
set @Output=''
select @Output=@Output+case
when InProcess=1
then N'│ '
else N' '
end
+space(BoxWidth)
from #BranchInfo
where TreeLevel<@Level

/*
And now draw...
1) Any in-process branch for our current level and the left border of our box
2) The line in the box (@BoxData) padded with leading and trailing space
3) The right border of our box and any any branch that might connect to children
4) Any Extra Info to appear to the right of the box
*/
set @Output=@Output+case
when @Counter<@PrimaryBoxLine
then case
when @Direction='R' then N' │'
when @Direction='L' then N'│ │'
when @Direction='M' then N'│ │'
when @Direction='T' then N' │'
end
when @Counter=@PrimaryBoxLine
then case
when @Direction='R' then N'┌─┤'
when @Direction='L' then N'└─┤'
when @Direction='M' then N'├─┤'
when @Direction='T' then N' │'
end
else case
when @Direction='R' then N'│ │'
when @Direction='L' then N' │'
when @Direction='M' then N'│ │'
when @Direction='T' then N' │'
end
end
+N' '+@BoxData+N' '
+case
when @Counter<@PrimaryBoxLine
then case
when @NumSubords=0 then N'│'
when @NumOnRight=0 then N'│'
else N'│ │'
end
when @Counter=@PrimaryBoxLine
then case
when @NumSubords=0 then N'│'
when @NumOnRight=0 then N'├─┐'
else N'├─┤'
end
else case
when @NumSubords=0 then N'│'
when @NumOnRight=0 then N'│ │'
else N'│ │'
end
end
+case
when @ExtraInfo<>'' and @Counter=@PrimaryBoxLine
then ' '+@ExtraInfo
else ''
end

/* Save that to our result table */
insert #TreeResult (TreeContent)
select @Output

end

/* Draw all branches in process for previous levels */
set @Output=''
select @Output=@Output+case
when InProcess=1
then N'│ '
else N' '
end
+space(BoxWidth)
from #BranchInfo
where TreeLevel<@Level

/*
And now draw...
1) Any in-process branch for our current level
2) The bottom border of our box
3) Any branch that might connect to left-hand children
*/
set @Output=@Output+case
when @Direction in ('R','M')
then N'│ '
else N' '
end
+N'└'+replicate(N'─',@MaxWidth+2)+N'┘'
+case
when @NumSubords=0 then N' '
when @NumOnRight=0 then N' │'
else N' │'
end

/* Save that to our result table */
insert #TreeResult (TreeContent)
select @Output

/*
Populate #BranchInfo with information about the current
Level's status in terms of Branches
*/
update #BranchInfo
set InProcess=case when @Direction in ('R','M') then 1 else 0 end
where
TreeLevel=@Level

/*
Now that we've finished drawing our box, we may recursively
process all children on our left side
*/
set @Counter=@NumOnRight
while @Counter<@NumSubords
begin
set @Counter=@Counter+1
select @IDToProcess=(select ID from @Subordinates where SeqNo=@Counter)
,@DirectionToProcess=case when @Counter=@NumSubords then 'L' else 'M' end
,@LevelToProcess=@Level+1
exec usp_DrawTree @IDToProcess
,@HangStyle
,@DirectionToProcess
,@LevelToProcess
end

/*
If we are at this point with Level 1, that means we have
successfully finished everything. So return our result
and clean up our temporary tables that we created.
*/
if @Level=1
begin
select TreeContent from #TreeResult order by TreeID
drop table #BranchInfo
drop table #TreeResult
end