Monday, October 26, 2009

Query Optimizer: Genius or Bonehead?

Sometimes I marvel at the ingenuity of the Query Optimizer in generating query plans. In past blog entries (here and here), I’ve thrown some things at it that it handled quite cleverly.

On the other hand, sometimes it does some pretty boneheaded things that don’t seem to make much sense.

Let’s take a look at some examples… from one side of the spectrum to the other.

(Important Note: All the boneheaded behavior I talk about in this article is found in SQL2005. SQL2008, on the other hand, does not exhibit any of the boneheaded behavior.)

Take the following query, which we’ll call Query #1:

/* Query #1 */
select distinct CustomerID,ProductID
from AdventureWorks.Sales.SalesOrderDetail sod
join AdventureWorks.Sales.SalesOrderHeader soh on sod.SalesOrderID=soh.SalesOrderID
where 1=2
and power(2,sqrt(0))<>1
and char(-1) is not null
and sin(pi()/2)<>1
and soh.SalesOrderID is null
What goes through the Optimizer’s head when it constructs this query?

SQL ServerHmmm… I can see right away that the first 4 predicates of the WHERE clause will evaluate to False. And since I know that SalesOrderID is a non-nullable column, it’s impossible for it to be NULL, so the 5th predicate is also False. Since the WHERE clause can never be True, this query will produce no rows. So, heck, why waste my time doing any SCANning or SEEKing of any tables? I don’t have to do anything! I can just sit back and relax.

This is the query plan that it puts together for the above query. It just plops out an empty set:

Query #1 Plan

Excellent.

The following query (Query #2) is exactly like Query #1, except all the predicates are negated so that they will all evaluate to True:

/* Query #2 */
select distinct CustomerID,ProductID
from AdventureWorks.Sales.SalesOrderDetail sod
join AdventureWorks.Sales.SalesOrderHeader soh on sod.SalesOrderID=soh.SalesOrderID
where 1<>2
and power(2,sqrt(0))=1
and char(-1) is null
and sin(pi()/2)=1
and soh.SalesOrderID is not null
This is the plan that is generated:

Query #2 Plan

The Optimizer’s thinking is:

SQL ServerHmmm… I can tell ahead of time that the WHERE clause is going to evaluate to True, so I can just throw it out the window… it’s unnecessary. So, it looks like I’ll have to join ALL rows from both tables. Okay, as far as the SalesOrderHeader table is concerned, all I need is the CustomerID and SalesOrderID, and I can just get both of those from the CustomerID nonclustered index and it will be faster to SCAN that than it would be to SCAN the entire clustered index, so I’ll attack that table that way. Similarly, I’ll use the ProductID nonclustered index from the SalesOrderDetail table to pick up the ProductID and SalesOrderID from that table. Then I’ll use a Hash Match to JOIN them together, and finally do a SELECT DISTINCT CustomerID, ProductID.

So it knows the WHERE clause is not needed, and it decides to SCAN the smaller-sized nonclustered indexes to get the data it needs. Brilliant! The Optimizer should be applauded.

What do our judges think of the plans for Query #1 and Query #2?

Two Einsteins

They awarded the plans Two Einsteins! Very impressive.

Now take a look at this query (Query #3). It’s exactly the same as Query #2, only with a different WHERE clause. Again, though, all the predicates evaluate to True… at least you and I can easily see that:

/* Query #3 */
select distinct CustomerID,ProductID
from AdventureWorks.Sales.SalesOrderDetail sod
join AdventureWorks.Sales.SalesOrderHeader soh on sod.SalesOrderID=soh.SalesOrderID
where SalesOrderDetailID=SalesOrderDetailID
and ProductID=ProductID
and SpecialOfferID=SpecialOfferID
and OrderQty=OrderQty
and UnitPrice=UnitPrice
But the Optimizer doesn’t see it that way. Here’s the query plan for that query:

Query #3 Plan

Note the arrows between the operators are a lot thinner than in the previous query, and note that it is doing a SCAN of the SalesOrderDetail clustered index and a SEEK into the SalesOrderHeader table. Here is apparently what’s going on inside the Optimizer’s head as it constructs this plan:

SQL ServerHmmm… All of the predicates in the WHERE clause refer to columns in the SalesOrderDetail table, which has 121,317 rows. If any of the predicates were referring to a constant value (like WHERE ProductID=777 for example), then I could use statistics to calculate the selectivity of each predicate. But all 5 predicates are comparing columns to some unknown value, so I will assume a selectivity of 10% for each one. Therefore my estimate is that I will choose 121,317 * 10% * 10% * 10% * 10% * 10% = 1.21317 rows from SalesOrderDetail. Since that’s such a small number of rows, I’ll SCAN the clustered index of the SalesOrderDetail table (since I have to get to the columns in the WHERE clause to test each one of them), and then, for each row found, I’ll do a Seek into the Clustered Index of the SalesOrderHeader table to get the CustomerID.

You can see the estimate of 1.21317 rows here in the properties of the Clustered Index Scan:

Estimated Rows in Query #3

This doesn’t make much sense. Why does the Optimizer think that it’s comparing columns to some unknown value? The only time something isn’t equal to itself is when it has a value of NULL, but all 5 columns referred to in the WHERE clause are non-nullable columns. It’s surprising that the Optimizer can’t see that each predicate is True.

And the plan that it did come up with performs terribly. Look at a comparison in performance between Query #2 and Query #3:

/*           CPU   #Reads
-------------------------
Query #2: 210ms 299
Query #3: 695ms 365,458
*/
Yikes! Look at the number of Reads! Think about it… in executing Query #3, for each one of the 121,317 rows in SalesOrderDetail, it does a SEEK into SalesOrderHeader. That’s really inefficient.

I don’t think our panel of judges will be thrilled with this, do you? Let’s see what they thought of the Query #3 plan:

Four Curlys

Four Curlys! That’s gotta hurt. But you have to admit... It was a boneheaded plan.

(Important Note: The above phenomenon occurs in SQL2005. It does not behave the same in SQL2008. In SQL2008, Query #2 and Query #3 have identical query plans.)

Here’s another example in inconsistent approaches by the Optimizer. Look at the following query (Query #4), which uses window functions in OVER clauses in a CTE:

/* Query #4 */
with cte as
(
select SalesOrderID,ProductID
,RNum=row_number() over (partition by ProductID order by SalesOrderID)
,Rnk=rank() over (partition by SpecialOfferID order by CarrierTrackingNumber)
,DRnk=dense_rank() over (partition by UnitPrice order by RowGUID)
from AdventureWorks.Sales.SalesOrderDetail
)
select SalesOrderID,ProductID
from cte
What does the Optimizer think when it constructs the plan for that query?

SQL ServerHmmm… Looks like there are a few window functions in there, creating columns called Rnum, Rnk, and DRnk. But look! Those columns aren’t referred to in the SELECT list of the main query at all! So heck, I’m not going to waste my time executing those window functions if they aren’t needed. So I’ll put together a plan that simply spits out SalesOrderID and ProductID from the SalesOrderDetail table. And I’ll minimize my time in acquiring those by doing a SCAN of the ProductID nonclustered index instead of SCANning the entire table’s data in the clustered index.

And that’s what it does, as we can see from the following plan:

Query #4 Plan

That’s a very sensible plan. What do the judges say?

Three Doc Browns

Three Emmett Browns, huh?. Sure, he’s a bit quirky, and he’s not as impressive as Einstein or Plato or Galileo or Leonardo or Mozart, but hey, he invented time travel… that’s pretty cool.

Now look at the following query (Query #5), which is exactly like Query #4, except that it uses aggregates in OVER clauses in the CTE rather than window functions:

/* Query #5 */
with cte as
(
select SalesOrderID,ProductID
,Cnt=count(*) over (partition by ProductID)
,TotQty=sum(OrderQty) over (partition by SpecialOfferID)
,AvgLine=avg(LineTotal) over (partition by UnitPrice)
from AdventureWorks.Sales.SalesOrderDetail
)
select SalesOrderID,ProductID
from cte
Wouldn’t you think that the Optimizer would come up with the exact same plan as Query #4? But nooooo… here’s what it’s thinking:

SQL ServerOh boy! Aggregates! I love aggregates! Any chance I can, I’ll calculate aggregates! I can’t get enough! And this query has 3 of them! Hot dog! I get to do a COUNT(*) and I get to do a SUM and an AVG too! And they each use different PARTITIONs, so I get to do three different SORTs. This is great! Wow! I feel tingly all over! Whoopee! Oh Happy Day!

That kind of sounds like my dog when I pull out the bag of dog food every single night. Sure enough, here’s the plan:

Query #5 Plan

Look at all those operators! All those aggregations and sorts and everything, and it’s all a waste of time because none of the calculations are in the final result set.

Look at the comparison in performance between Query #4 and Query #5:

/*           CPU   #Reads
-------------------------
Query #4: 39ms 236
Query #5: 3059ms 811,834
*/
Incredible. Query #5 had over 3400 times the number of reads as Query #4 and almost 100 times the CPU time.

Judges? Your verdict?

Nine SpongeBobs

Ouch!! Nine SpongeBobs! The judges definitely thought that was an atrocious, putrid, abysmal, boneheaded plan.

(Important Note: The above phenomenon occurs in SQL2005. It does not behave the same in SQL2008. In SQL2008, Query #4 and Query #5 have identical query plans.)

Of course, none of the above queries are necessarily ones you’d actually put together in the real world (though I did stumble upon the behavior of Query #4 and #5 while answering a question in the MSDN T-SQL Forum). And I don’t mean to put down the Optimizer. I guess “bonehead” is kind of a strong word. Perhaps the Optimizer is sometimes just a little bit naïve.

I guess the moral of the story, as always, is to look carefully at your query plans. In the words of another person bordering on genius and naïveté: “Query Plans are like a box of chocolates… You never know what you’re gonna get.”

4 comments:

  1. Great post Brad! I have a few comments, as usual 8^).

    The optimizer is basing cardinality decisions off what it believes to be true, which will vary from system to system depending on the distribution of data and the validity of the stastics. On my box query 3 gave the same query plan as query 2. If you notice the optimizer is making bad decisions, it is usually a good indicator that you are missing indexes or statistics.

    One other key thing of note is that the optimizer typically evaluates the where clause in one of the last phases of query processing, which means the optimizer has to go through all the other stages to create the query plan. Typically, you cannot "short circuit" a query even if you validate/invalidate the where clause by setting all filters to true/false.... Now that is the typical scenario, but there is an exception. The exception is called a "foldable" expression, http://technet.microsoft.com/en-us/library/ms175933.aspx. Essentially the optimizer is able to validate constant deterministic expressions early in query plan generation to give a better query plan. A foldable expression can be used to short circuit a query to avoid work, as shown in the example below.

    select distinct
    CustomerID,
    ProductID
    from
    AdventureWorks.Sales.SalesOrderDetail sod
    join
    AdventureWorks.Sales.SalesOrderHeader soh
    on sod.SalesOrderID = soh.SalesOrderID
    WHERE
    1=2
    and SalesOrderDetailID = SalesOrderDetailID
    and ProductID = ProductID
    and SpecialOfferID = SpecialOfferID
    and OrderQty = OrderQty
    and UnitPrice = UnitPrice

    Again great post. Keep'em coming.

    ReplyDelete
  2. Ha! Four Curly's was the bit for me!

    ReplyDelete
  3. @Adam:

    Thanks for the link on foldable expressions. I didn't realize there was documentation on it (or a term for it)... that's good to know. My "SELECT * vs SELECT 1" blog entry and "Down For the COUNT(*)" entry, as well as this one, had those foldable expressions in there (I especially like the SIN(PI()/2)=1 predicate).

    I'm glad you tested the queries out on your system as well. I'll bet that the reason Query #2 and #3 generated the same plan on your system (and not on mine) was because you did it on SQL2008 (which I don't currently have) and I did mine on SQL2005.

    I'll bet 2008 made an improvement there. I kind of suspect (though I searched the web and couldn't find confirmation on this) that adding a predicate of Table1.ColumnName = Table1.ColumnName to a query was an "old-fashioned" way of giving a hint to the optimizer that you wanted to "force" the selectivity of Table1 to be lower (i.e. 10%) in order to trick the optimizer into using Table1 as the driving table of the query. I wonder if this roundabout hinting approach was deprecated in SQL2008.

    I remember reading a book (I won't mention the title... but it was NOT one from Microsoft Press... in fact, I don't remember the exact title... I must have Wrox in my head) that mentioned "helping the query" by adding such a predicate to the WHERE clause, and I remember thinking that was nuts. Because it didn't help the query at all, but made it worse, as I demonstrated in this article.

    Thanks, as always, for the input. Much appreciated.

    ReplyDelete