SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server(): CASE Expression

Showing posts with label CASE Expression. Show all posts

Monday, April 26, 2010

Looking Under The Hood

I came across a couple of incidents in the last few weeks at the MSDN T-SQL Forum that are good illustrations of why it’s important to look under the hood and investigate query plans and see what’s really going on with the optimizer.

Both of these incidents involve the way that the optimizer translates certain language elements.

The first of these, involving the COALESCE() function, is fairly straightforward, and it has been talked about in other blogs.

But the behavior in the second incident, involving multiple NOT EXISTS predicates, came as a big surprise to me and others, and it uncovered other unexpected phenomena.

COALESCE() ==> CASE Expression

In a message in the forum a few weeks ago, someone wanted to demonstrate how to calculate running totals. Here’s an example of the type of query he posted, calculating running totals of the OrderQty column by SalesOrderID in the SalesOrderDetail table in AdventureWorks:

select SalesOrderID
      ,SalesOrderDetailID
      ,ProductID
      ,OrderQty
      ,QtyRunTot=OrderQty
                +coalesce((select sum(OrderQty)
                           from Sales.SalesOrderDetail
                           where SalesOrderID=sod.SalesOrderID
                             and SalesOrderDetailID<sod.SalesOrderDetailID)
                          ,0)
from Sales.SalesOrderDetail sod

So, for each row, the running total is calculated by taking the current row’s OrderQty amount and adding it to the result of a scalar subquery which totals the OrderQty values of all previous rows for that order (when sorted by SalesOrderDetailID). If the row happens to actually be the very first row of the order, then performing the SUM() in the subquery would produce a NULL value (because there were no previous rows), and therefore he used COALESCE() to ensure that a zero non-NULL value would be produced.

The logic is sound; however, if you ever looked under the hood and investigated any query plan involving COALESCE(), you would notice that the optimizer simply translates it into a CASE Expression. For example, this simple COALESCE() query…

select ProductID
      ,Color=coalesce(Color,'None')
from Production.Product

…is translated by the optimizer into the following:

select ProductID
      ,Color=case when Color is not null then Color else 'None' end 
from Production.Product

So what does that mean for our running totals query? It’s translated into this:

select SalesOrderID
      ,SalesOrderDetailID
      ,ProductID
      ,OrderQty
      ,QtyRunTot=OrderQty
                +case
                   when (select sum(OrderQty)
                         from Sales.SalesOrderDetail
                         where SalesOrderID=sod.SalesOrderID
                           and SalesOrderDetailID<sod.SalesOrderDetailID) is not null
                   then (select sum(OrderQty)
                         from Sales.SalesOrderDetail
                         where SalesOrderID=sod.SalesOrderID
                           and SalesOrderDetailID<sod.SalesOrderDetailID)
                   else 0
                 end
from Sales.SalesOrderDetail sod

And that means that, for each row, a subquery is going to be run to check for NULL-ness, and then, if it doesn’t produce a NULL value, then the same subquery is going to be run again to actually produce the mathematical SUM. Take a look at the actual execution plan for the COALESCE() query (click on the picture for to see an enlarged image):

Plan for COALESCE() on a Scalar Subquery

For each of the 121,317 Rows scanned in #6, the SUM() subquery is calculated by operators #7, #8, and #9. If that value is not NULL, then the SUM() subquery is calculated again by operators #10, #11, and #12. You can see that the SEEK operator #9 is executed 121,317 times… once for each row in the SalesOrderDetail table. The SEEK operator #12 is executed 89,852 times… that comes out to 121317 - 89852 = 31465 times that it was NOT executed because #7, #8, #9 returned a NULL value. Only the first line item in an order would return a NULL value, and, sure enough, there are 31465 orders in the system (and therefore 31465 “first line items”).

But what a waste of resources, repeating the SEEKing and aggregation for the majority of the line items! And the really ironic part about this whole thing is what is behind the Compute Scalar operators #7 and #10. If you look at them, they are actually calculating this:

case when count(*)=0 then null else sum(OrderQty) end

Yep, if no line items were processed by the Stream Aggregate operators (#8 and #11), then the Compute Scalar operators #7 and #10 purposely set the SUM() equal to NULL. And ironically, we have to check for that NULL and COALESCE() it back into a zero. It’s like we’re working at cross purposes.

Anyway, we can eliminate the repetition and waste in the query plan in a couple of ways. The easiest way is to use T-SQL’s proprietary ISNULL() function instead of (the ANSI-standard function) COALESCE():

select SalesOrderID
      ,SalesOrderDetailID
      ,ProductID
      ,OrderQty
      ,QtyRunTot=OrderQty
                +isnull((select sum(OrderQty)
                         from Sales.SalesOrderDetail
                         where SalesOrderID=sod.SalesOrderID
                           and SalesOrderDetailID<sod.SalesOrderDetailID)
                        ,0)
from Sales.SalesOrderDetail sod

Or you could re-write the query so that the SUM() subquery is inclusive of the current row… Note how the subquery below now has a less-than-or-equals sign rather than a less-than sign… This way no checking for NULL is required because the subquery will always process at least one row and therefore return a non-NULL value:

select SalesOrderID
      ,SalesOrderDetailID
      ,ProductID
      ,OrderQty
      ,QtyRunTot=(select sum(OrderQty)
                  from Sales.SalesOrderDetail
                  where SalesOrderID=sod.SalesOrderID
                    and SalesOrderDetailID<=sod.SalesOrderDetailID)
from Sales.SalesOrderDetail sod

So now that you know how COALESCE() is translated behind the scenes, you know that it is most likely not a good idea to apply it to scalar subqueries.

NOT EXISTS ==> LEFT (ANTI SEMI) JOIN

Someone recently posted a message on the forum regarding multiple NOT EXISTS predicates in his WHERE clause. He noticed that if he moved a NOT EXISTS predicate from the beginning of the WHERE clause to the end of the clause (after other NOT EXISTS predicates), the query would improve its performance considerably.

This seemed strange. It shouldn’t matter what order you specify your conditions in a WHERE clause, because the optimizer should figure out the most cost-effective way to apply the various predicates.

Right?

Well… er… no.

What he said was absolutely true, which I can demonstrate here.

First, create 3 tables… a Customer table, and two child tables called CustHugeRowSize and CustTeenyRowSize. All three tables have clustered indexes with CustomerID as the primary column of the index:

if object_id('TempDB..#Customers','U') is not null drop table #Customers
go
create table #Customers 
(
   CustomerID int primary key
  ,Name varchar(30)
)
go
  
if object_id('TempDB..#CustHugeRowSize','U') is not null drop table #CustHugeRowSize
go
create table #CustHugeRowSize
(
   CustomerID int
  ,SeqID int identity(1,1)
  ,Filler char(3500) default replicate('*',3500)
)
go
create unique clustered index PK_CustHuge on #CustHugeRowSize(CustomerID,SeqID)
go
 
if object_id('TempDB..#CustTeenyRowSize','U') is not null drop table #CustTeenyRowSize
go
create table #CustTeenyRowSize
(
   CustomerID int
  ,SeqID int identity(1,1)
  ,Junk bit default 0
)
go
create unique clustered index PK_CustTeeny on #CustTeenyRowSize(CustomerID,SeqID)
go

Now populate the Customer table with 100,000 rows, and populate the two child tables with 3 rows for each of those customers… with the exception of CustomerID 98765:

with 
  L0(c) as (select 0 union all select 0 union all select 0)        --3 Rows
 ,L1(c) as (select 0 from L0 a cross join L0 b cross join L0 c)    --27 Rows (3x3x3)
 ,L2(c) as (select 0 from L1 a cross join L1 b cross join L1 c)    --19683 Rows (27x27x27)
 ,L3(c) as (select 0 from L2 a cross join L2 b)                    --387,420,489 Rows (19683x19683)
 ,NN(n) as (select row_number() over (order by (select 0)) from L3)
insert #Customers (CustomerID,Name)
select n,'Customer Number '+ltrim(str(n))
from NN
where n<=100000
  
insert #CustHugeRowSize (CustomerID)
select CustomerID
from #Customers
cross join (select 1 union all select 2 union all select 3) X(N)
where CustomerID<>98765
 
insert #CustTeenyRowSize (CustomerID)
select CustomerID
from #Customers
cross join (select 1 union all select 2 union all select 3) X(N)
where CustomerID<>98765

Make sure the statistics are completely up-to-date in both child tables:

update statistics #CustHugeRowSize with fullscan
update statistics #CustTeenyRowSize with fullscan

And now let’s try two queries… both have two NOT EXISTS predicates, and the only difference between them is which NOT EXISTS predicate is specified first:

select *
from #Customers c
where not exists (select * from #CustHugeRowSize where CustomerID=c.CustomerID)
  and not exists (select * from #CustTeenyRowSize where CustomerID=c.CustomerID)
/*
CustomerID Name
---------- ---------------------
     98765 Customer Number 98765
*/ 
 
select *
from #Customers c
where not exists (select * from #CustTeenyRowSize where CustomerID=c.CustomerID)
  and not exists (select * from #CustHugeRowSize where CustomerID=c.CustomerID)
/*
CustomerID Name
---------- ---------------------
     98765 Customer Number 98765
*/

These two plans certainly produce the same result; however, as you can see in comparing their execution plans, the behavior (and performance) of the two queries is vastly different (click on the picture to see an enlarged image):

Did you see that comparison in cost? It’s a whopping 98% compared to 2%! And, sure enough, when you compare the two queries in the Profiler, you get these statistics:

/*
Description    CPU   Reads  Duration
------------------------------------
Query #1    1265ms  151464   28509ms
Query #2     161ms    1160     447ms
*/

Wow! Why the huge difference?

As I’ve mentioned in a previous blog post, when we look under the hood, we can see that a NOT EXISTS predicate is translated into a LEFT ANTI SEMI JOIN, which is a LEFT JOIN that only outputs rows from the LEFT side (SEMI) if there are no matching rows (ANTI) in the right side.

In the first query, a MERGE JOIN is used between the Customers table and the CustHugeRowSize table to do the LEFT ANTI SEMI JOIN to find the rows in Customers that do NOT EXIST in the CustHugeRowSize table. That means the engine must SCAN both tables (or, more accurately, SCAN the clustered indexes of both tables). It takes a while to SCAN the entire CustHugeRowSize table, because it consists of 299,997 rows, each of which are over 3500 bytes in size… only two rows can fit on a 8192-byte page. So, the engine has to scan through 299997 / 2 = 149,999 (leaf) pages, which is over a gigabyte of data.

In the second query, on the other hand, the MERGE JOIN is done between the Customers table and the CustTeenyRowSize table. Even though the CustTeenyRowSize table also has 299,997 rows, it takes no time at all to SCAN it, because its row size is only 16 bytes, so over 500 rows can fit on a single data page, and so the engine only has to scan through approximately 299997 / 500 = 600 pages = 4.8 megabytes. That’s 250 times smaller than the CustHugeRowSize table.

Well, that’s very interesting, and it explains why the first query is so much slower, but why in the heck would the order of NOT EXISTS predicates, and therefore, the order of the LEFT JOINs, make a difference?

The answer is: “I don’t know”… it shouldn’t make any difference… especially when the ON condition of the multiple JOINs refer solely back to the table in the FROM clause and not to each other.

For example, take a look at the following 3 queries. They each have the same 3 LEFT JOINs, specified in different orders, with each JOIN referring back to the FROM table in the ON condition and not referring to each other at all:

select c.CustomerID
from Sales.Customer c
left join Sales.StoreContact sc on c.CustomerID=sc.CustomerID
left join Sales.SalesOrderHeader soh on c.CustomerID=soh.CustomerID
left join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID
 
select c.CustomerID
from Sales.Customer c
left join Sales.SalesOrderHeader soh on c.CustomerID=soh.CustomerID
left join Sales.StoreContact sc on c.CustomerID=sc.CustomerID
left join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID
 
select c.CustomerID
from Sales.Customer c
left join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID
left join Sales.SalesOrderHeader soh on c.CustomerID=soh.CustomerID
left join Sales.StoreContact sc on c.CustomerID=sc.CustomerID

The queries are functionally the same… they produce the same result… however, each of the queries produces a completely different query plan! In each case, the FROM table and the first specified LEFT JOIN table are always processed together in a Merge Join. And the costs of the three queries are different as well: 35% : 33% : 31%. And, running them in Profiler shows that the first query is, in fact, slowest and the third query is fastest:

/*
Description    CPU   Reads  Duration
------------------------------------
Query #1     126ms     256     421ms
Query #2     114ms     253     351ms
Query #3      95ms     296     291ms
*/

It seems to me that the optimizer should figure out which is the best approach and choose the most cost-effective plan possible… in other words, it should ignore the order of my multiple outer joins and come up with the best, most cost-effective plan (i.e. the same plan) for each of those three queries.

I’ve inquired about this behavior with Microsoft but have not yet heard anything. Conor Cunningham did respond publicly in his blog last week, but, unless I misunderstood what he wrote, he indicates that the order of LEFT JOINs does not matter, which is the exact opposite of what I demonstrated above.

So, for the time being, it seems to me that if you have multiple NOT EXISTS predicates in your WHERE clause, or if you have multiple OUTER JOINs in a query, you may want to tinker with their order to make sure you come up with the most effective execution plan possible.

Tuesday, January 12, 2010

The Troll's Puzzle: A SQL Fable

This blog entry is participating in the second T-SQL Tuesday, hosted once again by Adam Machanic. You are invited to visit his blog to join the party and read more blogs participating in this month’s theme: Puzzling Situations.

Now please sit back and enjoy our story…

Once upon a time, in a faraway land, there lived three T-SQL programmers.

Frederick Function just loved to call T-SQL functions in his queries. He thought it made his code look more complicated and that made him feel good. He especially liked working with dates because his code would be sprinkled with lots and lots of DATEADDs and DATEDIFFs.

Osric Ordinal liked to take shortcuts in his queries any time he had the opportunity.

And Colin Columnname carefully named his columns in his SELECT list and used those column names throughout the query where possible.

These three young lads were on a quest to save a fair maiden who was held prisoner in a castle in a deep dark forest. This castle was guarded by an evil ugly troll. Legend had it that the troll had three seemingly simple yet deceptive T-SQL puzzles… If any brave soul was able to answer all three T-SQL puzzles successfully, he could enter the castle and save the fair maiden. However, if unsuccessful in solving any of the T-SQL puzzles, he would be eaten by the troll.

Our three young heroes were brave and confident men, and so they approached the troll.

“Whaddya want?” asked the troll of the three men, spitting green slime as he spoke.

“We have come to solve your T-SQL challenges and save the fair maiden,” answered each of the men.

“You’re all fools!” snarled the troll. “You are delving into territory where hundreds have failed.”

None of the men said anything… they were steadfast in their intentions.

“Hah! We shall proceed then. You all arrived just in time… I was getting a bit hungry. Heh-heh-heh.” The troll's stomach growled and his eyes narrowed and earwax dripped out of his right ear. “Take out your laptops, and prepare for the first challenge!” he bellowed.

The men were ready at their keyboards.

“Challenge Number One: Write a query that pulls out the first and last initial of all the names in the Contacts table in AdventureWorks. Name those columns FirstInitial and LastInitial. Finally, sort the result in order of the first and last initial. You have 60 seconds. Go!”

For a few seconds, the three brave men were in shock, because they couldn’t believe how easy this first challenge was, but after a short while, they started typing furiously at their keyboards. They all looked up long before the 60 seconds were over.

“Let’s see what you all have,” said the troll.

Frederick Function demonstrated his query. He used the same LEFT() functions in the ORDER BY clause that he had used in the SELECT list… he thought it looked really cool to have the functions repeated like that:

select FirstInitial=left(FirstName,1)
      ,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by left(FirstName,1)
        ,left(LastName,1)
/*
FirstInitial LastInitial
------------ -----------
A            A
A            A
A            A
A            A
...
Z            W
Z            W
Z            Y
Z            Z
(19972 rows)
*/

Osric Ordinal’s query took advantage of using ordinal numbers in the ORDER BY to represent column positions in the SELECT list. He loved taking this kind of shortcut:

select FirstInitial=left(FirstName,1)
      ,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by 1,2
/*
FirstInitial LastInitial
------------ -----------
A            A
A            A
A            A
A            A
...
Z            W
Z            W
Z            Y
Z            Z
(19972 rows)
*/

Colin Columnname’s query used the names of the columns that were established in the SELECT list in his ORDER BY clause:

select FirstInitial=left(FirstName,1)
      ,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by FirstInitial
        ,LastInitial
/*
FirstInitial LastInitial
------------ -----------
A            A
A            A
A            A
A            A
...
Z            W
Z            W
Z            Y
Z            Z
(19972 rows)
*/

“You have all successfully passed the first challenge,” growled the troll. “But that’s no great accomplishment… Most people do.”

The troll’s stomach rumbled again, this time so loud that the ground shook. A pimple on his left cheek popped spontaneously and violently, its putrid contents spewing forth and missing Colin by mere centimeters.

“Challenge Number Two: Wrap the query of your first challenge inside a stored procedure called #GetContactInitials which accepts a single string parameter called @SortOrder. If @SortOrder is equal to ‘First’ then sort by the first initial and last initial. If @SortOrder is equal to ‘Last’ then sort by last initial and first initial. No IF statements are allowed… you must have only a single query in the stored procedure. You have 60 seconds. Go!”

The men got to work.

Frederick Function finished very quickly, and came up with the following stored procedure, which compiled without error. Again, he was even more proud of his procedure because it had even more function calls in it:

if object_id('tempdb..[#GetContactInitials]','P') is not null 
  drop procedure #GetContactInitials
go
create procedure #GetContactInitials
  @SortOrder varchar(20)
as
select FirstInitial=left(FirstName,1)
      ,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by case @SortOrder 
           when 'First' then left(FirstName,1)
           when 'Last' then left(LastName,1)
         end
        ,case @SortOrder
           when 'First' then left(LastName,1)
           when 'Last' then left(FirstName,1)
         end

Osric Ordinal also produced his stored procedure very quickly,using his beloved ordinal number approach in the ORDER BY. His procedure also compiled without error:

if object_id('tempdb..[#GetContactInitials]','P') is not null 
  drop procedure #GetContactInitials
go
create procedure #GetContactInitials
  @SortOrder varchar(20)
as
select FirstInitial=left(FirstName,1)
      ,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by case @SortOrder 
           when 'First' then 1
           when 'Last' then 2
         end
        ,case @SortOrder
           when 'First' then 2
           when 'Last' then 1
         end

Colin Columnname at first had trouble. He had put his procedure together like so, but it produced compile errors:

if object_id('tempdb..[#GetContactInitials]','P') is not null 
  drop procedure #GetContactInitials
go
create procedure #GetContactInitials
  @SortOrder varchar(20)
as
select FirstInitial=left(FirstName,1)
      ,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by case @SortOrder 
           when 'First' then FirstInitial 
           when 'Last' then LastInitial
         end
        ,case @SortOrder
           when 'First' then LastInitial
           when 'Last' then FirstInitial
         end
/*
Msg 207, Level 16, State 1, Procedure #GetContactInitials, Line 9
Invalid column name 'FirstInitial'.
Msg 207, Level 16, State 1, Procedure #GetContactInitials, Line 10
Invalid column name 'LastInitial'.
Msg 207, Level 16, State 1, Procedure #GetContactInitials, Line 13
Invalid column name 'LastInitial'.
Msg 207, Level 16, State 1, Procedure #GetContactInitials, Line 14
Invalid column name 'FirstInitial'.
*/

At first this seemed confusing. Why was he getting error messages saying that ‘FirstInitial’ and ‘LastInitial’ were invalid column names? Clearly they were valid in his first challenge. And the ORDER BY is one of the last operations taking place in a query, so the column names would have already been established by the SELECT list. It was puzzling indeed.

Then, in a flash of insight, Colin understood. Because the ORDER BY clause consisted of CASE expressions, T-SQL had to evaluate those expressions for each and every row in the FROM table. In other words, during the Clustered Index Scan of the Contacts table, T-SQL would have to compute the CASE expression values. And then, only after all rows were evaluated in this manner, finally the Sort required by the ORDER BY would take place on those already-evaluated CASE expressions. So, while it was true that the actual ORDER BY Sort logically took place at the end of a query operation, the expressions within the ORDER BY list had to be evaluated at the beginning of the query operation.

So Colin quickly re-wrote his query to instead use a derived table in his FROM clause. This way he could keep his column names in the ORDER BY. He managed to type the very last character of code just as the 60 seconds were up:

if object_id('tempdb..[#GetContactInitials]','P') is not null 
  drop procedure #GetContactInitials
go
create procedure #GetContactInitials
  @SortOrder varchar(20)
as
select FirstInitial
      ,LastInitial
from (select FirstInitial=left(FirstName,1)
            ,LastInitial=left(LastName,1)
      from AdventureWorks.Person.Contact) InitialValues
order by case @SortOrder 
           when 'First' then FirstInitial 
           when 'Last' then LastInitial
         end
        ,case @SortOrder
           when 'First' then LastInitial
           when 'Last' then FirstInitial
         end

“Now we will test out your stored procedures,” snarled the troll.

Frederick Function demonstrated his stored procedure, and it produced the correct output:

exec #GetContactInitials 'First'
/*
FirstInitial LastInitial
------------ -----------
A            A
A            A
A            A
A            A
...
Z            W
Z            W
Z            Y
Z            Z
(19972 rows)
*/
 
exec #GetContactInitials 'Last'
/*
FirstInitial LastInitial
------------ -----------
A            A
A            A
A            A
A            A
...
W            Z
W            Z
W            Z
Z            Z
(19972 rows)
*/

Colin Columnname demonstrated his stored procedure, and it, too, produced the same correct output.

Finally, Osric Ordinal demonstrated his stored procedure. But it produced output that was unexpected:

exec #GetContactInitials 'First'
/*
FirstInitial LastInitial
------------ -----------
G            A
C            A
K            A
H            A
P            A
...
C            G
I            R
C            H
C            Z
C            H
(19972 rows)
*/
 
exec #GetContactInitials 'Last'
/*
FirstInitial LastInitial
------------ -----------
G            A
C            A
K            A
H            A
P            A
...
C            G
I            R
C            H
C            Z
C            H
(19972 rows)
*/

“I don’t understand,” said Osric. “I wrapped my original query in a stored procedure, and I used a CASE exp---“

But Osric didn’t get a chance to finish his sentence, because his head had suddenly disappeared. It was being pulverized by the troll’s sharp teeth. Before the rest of Osric’s headless body could crumple to the ground, the troll scooped it up and swallowed it too.

The troll let out a loud belch. Frederick and Colin tried not to faint from the foul stench of his breath.

Colin glanced over at Osric’s laptop and saw that Osric’s query had the same CASE expression evaluation problem that Colin’s original query had. Osric had (incorrectly) figured that his use of ordinal column positions would still be honored within the CASE expressions, but instead, they were just evaluated as simple constant integers of 1 and 2, not referring to column positions anymore at all. For example, when his procedure was called with the parameter ‘First’, each row had the same CASE expression evaluations… the first CASE would evaluate to the constant integer 1 and the second would evaluate to the constant integer 2. Since every single row had the same constant integer expressions to ORDER BY, the final result set simply came out in the order it was scanned… in clustered index order.

“Challenge Number Three: This is similar to Challenge Number Two. The only difference is that the stored procedure is to be called #GetDistinctContactInitials and the result set should be the DISTINCT set of first and last initials. The same rules apply regarding the @SortOrder parameter and having only a single query in the stored procedure. You have 60 seconds. Go!”

Colin took his procedure from Challenge Number Two and renamed the procedure as instructed and simply added a DISTINCT to his derived table (and also renamed the derived table’s alias name for clarity and completeness). It compiled without error:

if object_id('tempdb..[#GetDistinctContactInitials]','P') is not null 
  drop procedure #GetDistinctContactInitials
go
create procedure #GetDistinctContactInitials
  @SortOrder varchar(20)
as
select FirstInitial
      ,LastInitial
from (select 
      DISTINCT FirstInitial=left(FirstName,1)
              ,LastInitial=left(LastName,1)
      from AdventureWorks.Person.Contact) DistinctValues
order by case @SortOrder 
           when 'First' 
           then FirstInitial 
           else LastInitial
         end
        ,case @SortOrder
           when 'First' 
           then LastInitial
           else FirstInitial
         end

Frederick did the same thing… he renamed the procedure and added a DISTINCT to his query, but he got the following error when trying to compile it:

if object_id('tempdb..[#GetDistinctContactInitials]','P') is not null 
  drop procedure #GetDistinctContactInitials
go
create procedure #GetDistinctContactInitials
  @SortOrder varchar(20)
as
select 
DISTINCT FirstInitial=left(FirstName,1)
        ,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by case @SortOrder 
           when 'First' then left(FirstName,1)
           when 'Last' then left(LastName,1)
         end
        ,case @SortOrder
           when 'First' then left(LastName,1)
           when 'Last' then left(FirstName,1)
         end
/*
Msg 145, Level 15, State 1, Procedure #GetDistinctContactInitials, Line 5
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
*/

What the heck? All Frederick really did was add a DISTINCT to his query. Why was it giving him problems? And what did the error message mean? The ORDER BY items were clearly in the SELECT list, he thought.

Frederick wasn’t as clever as Colin in figuring out the fact that having CASE expressions in his ORDER BY clause forced T-SQL to evaluate those expressions during the Clustered Index Scan phase of the query. It had worked fine for his procedure in Challenge Number Two. But by adding the DISTINCT in Challenge Number Three, everything changed. Since adding the DISTINCT keyword shrinks the result set, one must ORDER BY only the columns in that resulting shrunken set… One cannot ORDER BY any expressions based on columns in the original FROM table any more.

Frederick started to get nervous. As the precious seconds ticked by, he could see no way to correct his query.

“Time’s up!” growled the troll, “Let’s see what you have.” He looked at Frederick’s procedure and saw that it had compile errors.

Rivulets of sweat were gushing down Frederick’s face and body like waterfalls.

“Ahhhh…” said the troll, “Thank you for adding the fine salty marinade.” And without another word, he gobbled up Frederick, swallowing him whole.

The troll turned to Colin.

“Demonstrate your stored procedure!” commanded the troll.

Colin did just that, coming up with the correct result:

exec #GetDistinctContactInitials 'First'
/*
FirstInitial LastInitial
------------ -----------
A            A
A            B
A            C
A            D
A            E
...
Z            S
Z            T
Z            W
Z            Y
Z            Z
(544 Rows)
*/
 
exec #GetDistinctContactInitials 'Last'
/*
FirstInitial LastInitial
------------ -----------
A            A
B            A
C            A
D            A
E            A
...
S            Z
T            Z
V            Z
W            Z
Z            Z
(544 Rows)
*/

“Finally! It’s about time!” said the troll. “I was getting sick and tired of this whole SQL puzzle gig. Congratulations… The maiden is yours. I’ll see ya later… I’m off to terrorize a village.” And he left without another word, trudging over a nearby hill.

Colin raced into the castle and took the steps up to the tower three at a time.

And there was the fair maiden. She turned to look at him, the twin silks from her wimple rippling in the soft breeze from the window.

“My word!” he exclaimed, “You’re absolutely beautiful! In fact, in that white dress and wimple, you look just Yeoman Tonia Barrows in---”

The fair maiden interrupted, completing his thought: “…in Shore Leave, the 17th episode of the original series of Star Trek.”

Colin, a certified Star Trek geek, was speechless.

“Thank you,” continued the maiden, “That’s kind of you to say. And that was a fairly good episode, don’t you think? However, I have to say that it doesn’t hold up to other first-season episodes like Devil In The Dark or The City On The Edge of Forever, does it? Still, Theodore Sturgeon, who wrote the teleplay of Shore Leave, did go on to write Amok Time, which was a classic second-season episode.”

Colin couldn’t help himself. “Wow! What a woman!” he exclaimed. He took her into his arms.

And they lived long and prospered happily ever after.

The End.

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Monday, April 26, 2010

Looking Under The Hood

Tuesday, January 12, 2010

The Troll's Puzzle: A SQL Fable

Search This Blog

Subscribe

About Me

Keywords

Blog Archive

Followers

# # #

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Monday, April 26, 2010

Looking Under The Hood

Tuesday, January 12, 2010

The Troll's Puzzle: A SQL Fable

Search This Blog

Subscribe

Subscribe To...

About Me

Keywords

Blog Archive

Followers

# # #