On the other hand, sometimes it does some pretty boneheaded things that don’t seem to make much sense.
Let’s take a look at some examples… from one side of the spectrum to the other.
(Important Note: All the boneheaded behavior I talk about in this article is found in SQL2005. SQL2008, on the other hand, does not exhibit any of the boneheaded behavior.)
Take the following query, which we’ll call Query #1:
/* Query #1 */What goes through the Optimizer’s head when it constructs this query?
select distinct CustomerID,ProductID
from AdventureWorks.Sales.SalesOrderDetail sod
join AdventureWorks.Sales.SalesOrderHeader soh on sod.SalesOrderID=soh.SalesOrderID
where 1=2
and power(2,sqrt(0))<>1
and char(-1) is not null
and sin(pi()/2)<>1
and soh.SalesOrderID is null
![SQL Server](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiidGjBa-zwRkTqQPbzRUW7SvKnPIaQ1VpHVtmXGq_UzpJW7gn3K8OXBNYbhnW3YqfKFvZWBWL7gBpuxwR4mcx6GG4T8fjHadm9gl56h_7JblJxWPWHGyqXcuHZcdHIXahHdx0xWYlu5FWX/s400/ss2008logo.jpg)
This is the query plan that it puts together for the above query. It just plops out an empty set:
![Query #1 Plan](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS664oouvx-U8fsX9eJN6TH3Lc40dwggZ_tZJsAaoa0RR5U1ZCM3J3v7SRBro5ybSA5gMwdRYryAQhbpi1gleeSsTSIk4MrGCPmJwBn3K1UC_Ade05gboXFTCSZaK5_CrKf9Os1PTjEM9H/s400/allfalsequeryplan.jpg)
Excellent.
The following query (Query #2) is exactly like Query #1, except all the predicates are negated so that they will all evaluate to True:
/* Query #2 */This is the plan that is generated:
select distinct CustomerID,ProductID
from AdventureWorks.Sales.SalesOrderDetail sod
join AdventureWorks.Sales.SalesOrderHeader soh on sod.SalesOrderID=soh.SalesOrderID
where 1<>2
and power(2,sqrt(0))=1
and char(-1) is null
and sin(pi()/2)=1
and soh.SalesOrderID is not null
![Query #2 Plan](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbhCSHZC-MCvs0qU2Yf9Ts9hPpVL_95KviQH-KajzeyNJrxip6cBSy7eey7tTR5orinMHVXcxNrOzWIkNpCrJfKJVXiw7vN_-MCT0diItFy1SRWvtx2BkfvPSFPZjmgrJviTLIPsS8KiQU/s400/alltruequeryplan.jpg)
The Optimizer’s thinking is:
![SQL Server](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiidGjBa-zwRkTqQPbzRUW7SvKnPIaQ1VpHVtmXGq_UzpJW7gn3K8OXBNYbhnW3YqfKFvZWBWL7gBpuxwR4mcx6GG4T8fjHadm9gl56h_7JblJxWPWHGyqXcuHZcdHIXahHdx0xWYlu5FWX/s400/ss2008logo.jpg)
So it knows the WHERE clause is not needed, and it decides to SCAN the smaller-sized nonclustered indexes to get the data it needs. Brilliant! The Optimizer should be applauded.
What do our judges think of the plans for Query #1 and Query #2?
![Two Einsteins](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDM74_D6Gu_Sps17XXJ17RnLTxNQW49q7MPIRcxfDbcTdl7G1vQ9BuetPRNCumni5l4SaYa-je-Nm8ewVM3IAAAqPpYXcSKxsSGOKMrqSHIOty_sJxZqz9vZYt6KcGgFRnRiRz4xoTXnpt/s400/2Einstein.jpg)
They awarded the plans Two Einsteins! Very impressive.
Now take a look at this query (Query #3). It’s exactly the same as Query #2, only with a different WHERE clause. Again, though, all the predicates evaluate to True… at least you and I can easily see that:
/* Query #3 */But the Optimizer doesn’t see it that way. Here’s the query plan for that query:
select distinct CustomerID,ProductID
from AdventureWorks.Sales.SalesOrderDetail sod
join AdventureWorks.Sales.SalesOrderHeader soh on sod.SalesOrderID=soh.SalesOrderID
where SalesOrderDetailID=SalesOrderDetailID
and ProductID=ProductID
and SpecialOfferID=SpecialOfferID
and OrderQty=OrderQty
and UnitPrice=UnitPrice
![Query #3 Plan](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjw81RT4F8LWsYUZCxwqBv1U9bonGNXKtVKKl9LXHt6HAN39wpAavx2dPr-8dIHIOgm7GHMAaciowQ1npyF8mgDUqvi4v8Z9_9cZ9kIxkNXAldHSkPvC7V5LpFCYN1zgOgPzy4VofMngXQU/s400/equaltoselfqueryplan.jpg)
Note the arrows between the operators are a lot thinner than in the previous query, and note that it is doing a SCAN of the SalesOrderDetail clustered index and a SEEK into the SalesOrderHeader table. Here is apparently what’s going on inside the Optimizer’s head as it constructs this plan:
![SQL Server](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiidGjBa-zwRkTqQPbzRUW7SvKnPIaQ1VpHVtmXGq_UzpJW7gn3K8OXBNYbhnW3YqfKFvZWBWL7gBpuxwR4mcx6GG4T8fjHadm9gl56h_7JblJxWPWHGyqXcuHZcdHIXahHdx0xWYlu5FWX/s400/ss2008logo.jpg)
You can see the estimate of 1.21317 rows here in the properties of the Clustered Index Scan:
![Estimated Rows in Query #3](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9V_otRvfyl8fHFk1-bXd2119sfzDd08TQUIgBCr71XrbmJYZbU5jHeNphiWCG8THqxDkkZSm9Lt9Op2URps-G0F8PpzR3j5ozvYVFnmhVLawjoUVdKWwnWUZZ8kUCG6quzAYZx9TaXAl3/s400/EstRows.jpg)
This doesn’t make much sense. Why does the Optimizer think that it’s comparing columns to some unknown value? The only time something isn’t equal to itself is when it has a value of NULL, but all 5 columns referred to in the WHERE clause are non-nullable columns. It’s surprising that the Optimizer can’t see that each predicate is True.
And the plan that it did come up with performs terribly. Look at a comparison in performance between Query #2 and Query #3:
/* CPU #Reads
-------------------------
Query #2: 210ms 299
Query #3: 695ms 365,458
*/
Yikes! Look at the number of Reads! Think about it… in executing Query #3, for each one of the 121,317 rows in SalesOrderDetail, it does a SEEK into SalesOrderHeader. That’s really inefficient.I don’t think our panel of judges will be thrilled with this, do you? Let’s see what they thought of the Query #3 plan:
![Four Curlys](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWlQcNnJ-qf5F60TRiYf-kE6EDedmHQnUgXAZOP7EALRHg5CF7z2UImXHlgTt1-K01z8d5aMXFeTsmSyCy78ukbJMRdLU4kGCtZR0eojEcP_zCbVOxo8Ewe8IFXLvniooW62CMIDFwz0LH/s400/4Curly.jpg)
Four Curlys! That’s gotta hurt. But you have to admit... It was a boneheaded plan.
(Important Note: The above phenomenon occurs in SQL2005. It does not behave the same in SQL2008. In SQL2008, Query #2 and Query #3 have identical query plans.)
Here’s another example in inconsistent approaches by the Optimizer. Look at the following query (Query #4), which uses window functions in OVER clauses in a CTE:
/* Query #4 */What does the Optimizer think when it constructs the plan for that query?
with cte as
(
select SalesOrderID,ProductID
,RNum=row_number() over (partition by ProductID order by SalesOrderID)
,Rnk=rank() over (partition by SpecialOfferID order by CarrierTrackingNumber)
,DRnk=dense_rank() over (partition by UnitPrice order by RowGUID)
from AdventureWorks.Sales.SalesOrderDetail
)
select SalesOrderID,ProductID
from cte
![SQL Server](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiidGjBa-zwRkTqQPbzRUW7SvKnPIaQ1VpHVtmXGq_UzpJW7gn3K8OXBNYbhnW3YqfKFvZWBWL7gBpuxwR4mcx6GG4T8fjHadm9gl56h_7JblJxWPWHGyqXcuHZcdHIXahHdx0xWYlu5FWX/s400/ss2008logo.jpg)
And that’s what it does, as we can see from the following plan:
![Query #4 Plan](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiABn47Z4H6pbKAGf7YXU3xymtuks5C6Hy5ZlOY7z6bRSqWQ-DLglc4BtpZfPKOk2QrCRH2FvKUGicGIpEi8C4tj-QqztWYcjaJa-jvBfjk0WfrGiOZkeKlIrr8Vd3ZcdCxUniZSXq-Pn8f/s400/windowfuncplan.jpg)
That’s a very sensible plan. What do the judges say?
![Three Doc Browns](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyCvGMW7UzN2npULpcPqxtHtXm-jYabGhvsIVSo6JfIG_-GvDCDufZYLrMyEwZuwGOrg5efEs6ajphSvxePmQb9-94kppLrE24Ye5AG61Iv6oqsC26-fEdXrtWoVsFKbfEawOjLjq78qWa/s400/3DocBrown.jpg)
Three Emmett Browns, huh?. Sure, he’s a bit quirky, and he’s not as impressive as Einstein or Plato or Galileo or Leonardo or Mozart, but hey, he invented time travel… that’s pretty cool.
Now look at the following query (Query #5), which is exactly like Query #4, except that it uses aggregates in OVER clauses in the CTE rather than window functions:
/* Query #5 */Wouldn’t you think that the Optimizer would come up with the exact same plan as Query #4? But nooooo… here’s what it’s thinking:
with cte as
(
select SalesOrderID,ProductID
,Cnt=count(*) over (partition by ProductID)
,TotQty=sum(OrderQty) over (partition by SpecialOfferID)
,AvgLine=avg(LineTotal) over (partition by UnitPrice)
from AdventureWorks.Sales.SalesOrderDetail
)
select SalesOrderID,ProductID
from cte
![SQL Server](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiidGjBa-zwRkTqQPbzRUW7SvKnPIaQ1VpHVtmXGq_UzpJW7gn3K8OXBNYbhnW3YqfKFvZWBWL7gBpuxwR4mcx6GG4T8fjHadm9gl56h_7JblJxWPWHGyqXcuHZcdHIXahHdx0xWYlu5FWX/s400/ss2008logo.jpg)
That kind of sounds like my dog when I pull out the bag of dog food every single night. Sure enough, here’s the plan:
![Query #5 Plan](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMFrWwGcF7qcvR3Cgp58_RhNGXoU_UsndupPaRsEZTAkIrfALEKKDMthJpIeH0r-sHsFFfq8BLyJQ_aqlwY5cxdl5jiQNzbl8LdIm6cnXAgieGOL6K0sqjXI4pzqw-73yoPRlywnbLeAAL/s400/aggregatequeryplan.jpg)
Look at all those operators! All those aggregations and sorts and everything, and it’s all a waste of time because none of the calculations are in the final result set.
Look at the comparison in performance between Query #4 and Query #5:
/* CPU #Reads
-------------------------
Query #4: 39ms 236
Query #5: 3059ms 811,834
*/
Incredible. Query #5 had over 3400 times the number of reads as Query #4 and almost 100 times the CPU time.Judges? Your verdict?
![Nine SpongeBobs](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1vGrqOgOJxo9ZFdsFw8Q_ANpmD9fHlWnQkvFNLyt8T4YXzvibontioYpoSQzZFFJNecu7JXbYrHCwMo8ZQPQ3_mD1BWQcJ9vMMAkcgbGHQlXHkC0ZSGrlpMVngBPzllys0-r9fY60Q32G/s400/9SpongeBob.jpg)
Ouch!! Nine SpongeBobs! The judges definitely thought that was an atrocious, putrid, abysmal, boneheaded plan.
(Important Note: The above phenomenon occurs in SQL2005. It does not behave the same in SQL2008. In SQL2008, Query #4 and Query #5 have identical query plans.)
Of course, none of the above queries are necessarily ones you’d actually put together in the real world (though I did stumble upon the behavior of Query #4 and #5 while answering a question in the MSDN T-SQL Forum). And I don’t mean to put down the Optimizer. I guess “bonehead” is kind of a strong word. Perhaps the Optimizer is sometimes just a little bit naïve.
I guess the moral of the story, as always, is to look carefully at your query plans. In the words of another person bordering on genius and naïveté: “Query Plans are like a box of chocolates… You never know what you’re gonna get.”
Great post Brad! I have a few comments, as usual 8^).
ReplyDeleteThe optimizer is basing cardinality decisions off what it believes to be true, which will vary from system to system depending on the distribution of data and the validity of the stastics. On my box query 3 gave the same query plan as query 2. If you notice the optimizer is making bad decisions, it is usually a good indicator that you are missing indexes or statistics.
One other key thing of note is that the optimizer typically evaluates the where clause in one of the last phases of query processing, which means the optimizer has to go through all the other stages to create the query plan. Typically, you cannot "short circuit" a query even if you validate/invalidate the where clause by setting all filters to true/false.... Now that is the typical scenario, but there is an exception. The exception is called a "foldable" expression, http://technet.microsoft.com/en-us/library/ms175933.aspx. Essentially the optimizer is able to validate constant deterministic expressions early in query plan generation to give a better query plan. A foldable expression can be used to short circuit a query to avoid work, as shown in the example below.
select distinct
CustomerID,
ProductID
from
AdventureWorks.Sales.SalesOrderDetail sod
join
AdventureWorks.Sales.SalesOrderHeader soh
on sod.SalesOrderID = soh.SalesOrderID
WHERE
1=2
and SalesOrderDetailID = SalesOrderDetailID
and ProductID = ProductID
and SpecialOfferID = SpecialOfferID
and OrderQty = OrderQty
and UnitPrice = UnitPrice
Again great post. Keep'em coming.
Ha! Four Curly's was the bit for me!
ReplyDelete@Adam:
ReplyDeleteThanks for the link on foldable expressions. I didn't realize there was documentation on it (or a term for it)... that's good to know. My "SELECT * vs SELECT 1" blog entry and "Down For the COUNT(*)" entry, as well as this one, had those foldable expressions in there (I especially like the SIN(PI()/2)=1 predicate).
I'm glad you tested the queries out on your system as well. I'll bet that the reason Query #2 and #3 generated the same plan on your system (and not on mine) was because you did it on SQL2008 (which I don't currently have) and I did mine on SQL2005.
I'll bet 2008 made an improvement there. I kind of suspect (though I searched the web and couldn't find confirmation on this) that adding a predicate of Table1.ColumnName = Table1.ColumnName to a query was an "old-fashioned" way of giving a hint to the optimizer that you wanted to "force" the selectivity of Table1 to be lower (i.e. 10%) in order to trick the optimizer into using Table1 as the driving table of the query. I wonder if this roundabout hinting approach was deprecated in SQL2008.
I remember reading a book (I won't mention the title... but it was NOT one from Microsoft Press... in fact, I don't remember the exact title... I must have Wrox in my head) that mentioned "helping the query" by adding such a predicate to the WHERE clause, and I remember thinking that was nuts. Because it didn't help the query at all, but made it worse, as I demonstrated in this article.
Thanks, as always, for the input. Much appreciated.
@Michael:
ReplyDeleteNyuk-nyuk-nyuk.
IT companies can optimize the advantages of outsourcing services by following certain best practices. The first step is to select a service provider with the appropriate skills and experience. It is also crucial to establish clear expectations and goals for the project, including timelines, deliverables, and quality standards. Successful outsourcing requires effective communication and collaboration, so regular check-ins and status updates with the service provider should be established. To learn more more helpful hints visit the website by clicking on the provided link.
ReplyDelete