Friday, November 18, 2011

T-SQL Tuesday #024: The Roundup

T-SQL TuesdayThanx to everyone for their contributions to T-SQL Tuesday #024: Prox ‘n’ Funx.

The roundup is finally here… Well, shux, I’ve been so busy that I almost let it fall through the crax, but before weex and weex went by, I finally read through the stax of submissions. I certainly had a lot of kix reading them… you could hear shriex of delight coming out of my office window.

(Okay, I admit it… The roundup was late because I spent a lot of time looking up words that end with a -ks sound… It loox like you can count me among the ranx of freax ‘n’ geex ‘n’ punx ‘n’ jerx ‘n’ dorx ‘n’ quax who carry something a little too far… It stinx, doesn’t it?).

Methinx it was a success! The SQL Community rox!

But don’t take it from me… Read the submissions for yourself. Click on the linx below for some good reading.

declare @Subject varchar(100)='T-SQL Tuesday #024: Prox ''n'' Funx';

insert [Brad Schulz].Blog
select Subject='T-SQL Tuesday #024 Roundup'
,Content
from
(
select Content from [Rob Farley].Blog where Subject=@Subject
union all
select Content from [Noel McKinney].Blog where Subject=@Subject
union all
select Content from [Greg Lucas].Blog where Subject=@Subject
union all
select Content from [Michael J. Swart].Blog where Subject=@Subject
union all
select Content from [Kent Chenery].Blog where Subject=@Subject
union all
select Content from [Steve Wales].Blog where Subject=@Subject
union all
select Content from [Merrill Aldrich].Blog where Subject=@Subject
union all
select Content from [Aaron Bertrand].Blog where Subject=@Subject
union all
select Content from [Pinal Dave].Blog where Subject=@Subject
union all
select Content from [Rich Brown].Blog where Subject=@Subject
union all
select Content from [Bob Pusateri].Blog where Subject=@Subject
union all
select Content from [Kerry Tyler].Blog where Subject=@Subject
union all
select Content from [Jes Schultz Borland].Blog where Subject=@Subject
union all
select Content from [Thomas Rushton].Blog where Subject=@Subject
union all
select Content from [Jason Brimhall].Blog where Subject=@Subject
union all
select Content from [Nancy Hidy Wilson].Blog where Subject=@Subject
union all
select Content from [David Howard].Blog where Subject=@Subject
union all
select Content from [Brad Schulz].Blog where Subject=@Subject
) Contributions
Rob Farley contributes a wonderfully detailed post comparing inline functions and non-inline functions, filled with lots of examples and query plans. Rob categorically states that “if it’s not inline, it’s rubbish.” I couldn’t agree more, but then again, I’m always a sucker for any blog post that uses the word “rubbish”.

Noel McKinney talks about a situation concerning the abuse and overuse of stored procedures. Imagine a database that is completely devoid of data and yet consumes truckloads of disk space… because it houses 27,000 stored procedures! A good story to keep in the back of your mind in designing solutions.

Greg Lucas gives us a detailed explanation of a useful ExceptionHandler procedure that is designed to be used in the CATCH block of a TRY…CATCH construct, logging detailed information about the error in a table. It encapsulates and standardizes error handling so that the developer can save time in coding.

Michael J. Swart talks about some interesting numbers related to stored procedures. At first I thought it would be the usual numbers like “maximum parameters allowed” and “maximum nested levels”, but if you know Michael, then you know he wouldn’t only spout out dry statistics like that. By the time you finish his list, you’ll be struggling to catch your breath from laughing.

Kent Chenery joins T-SQL Tuesday for the first time, contributing a CLR routine for calculating Easter in any given year. That’s pretty cool… I never even thought about the fact that an algorithm existed for calculating Easter. Make sure you read the Wikipedia article he references… the detail behind the algorithm is fascinating.

Steve Wales is another first-time contributor to T-SQL Tuesday. He talks about the differences in coding procedures and functions between SQL Server and that other database created by that company in my neck of the woods, Oracle. He also provides a link to an earlier post of his that warns about the fact that you have to be aware of the compatibility mode when using built-in functions.

Merrill Aldrich offers his ode to composable solutions, illustrating how table-valued functions (and views) offer maximum composability and flexibility (and that stored procedures don’t offer that same flexibility). As a rabid fan of TVF’s, I couldn’t agree more.

Aaron Bertrand wrote a terrific post on the shortcomings of INFORMATION_SCHEMA views earlier this month, but he waited for T-SQL Tuesday to post his pièce de résistance, plunging the knife deeper in talking about INFORMATION_SCHEMA.ROUTINES (and syscomments and sp_helptext) and how they just don’t cut it in getting the full procedure definition.

Pinal Dave talks about the new CUME_DIST() analytic function in SQL2012, which provides a really cool easy-to-implement percentile-calculating capability. Pinal’s post is the first of a series of blog posts about the new SQL2012 analytic functions… make sure you add them to your must-read list.

Rich Brown tells us about the benefits of using stored procedures for Reporting Services datasets, abstracting away the data layer from the presentation layer. He also brings up an interesting idea of putting Reporting Services procedures into their own schema.

Bob Pusateri gives an overview of the pros and cons of both procedures and functions. Speaking of procedures, though, I have to also mention that I found Bob’s post of October 18th to be very interesting in terms of using a cool trick of using sp_executesql to create a view in another database besides the current one.

Kerry Tyler discusses scenarios (and some myths) regarding transactions being left open by procedures that produce errors or experience timeouts, concluding with a discussion on XACT_ABORT.

Jes Schultz Borland expresses her frustration about the fact that a Reporting Services dataset only recognizes the first result set from a procedure that produces multiple result sets. It is confusing, being that the term dataset has different meanings across platforms (in .NET the dataset class truly is a collection of datatables).

Thomas Rushton makes a “Hell yes” case in answer to the question, “Should I learn to use stored procedures?” And he follows up with a second contribution talking about the best practice in releasing stored procedure scripts (or scripts for any kind of object for that matter).

Jason Brimhall talks about functions… specifically functions that he is thankful for, functions he can’t live without, and functions that he had fun with in delving into SQL Server internals. I’m going to set aside some time to read more about those.

Nancy Hidy Wilson reminds us of the myriad of information we can glean from the good old SERVERPROPERTY() function… and she reminds us to check out DATABASEPROPERTY() and DATABASEPROPERTYEX() as well.

David Howard discussed bad plans resulting from parameter sniffing and gave a list of techniques to solve the problem.

And, finally, my contribution talked about a procedure that I use to deal with a client system that contains no stored procedures at all… it finds the original text of ad-hoc queries in a Forced Parameterization database.

Thanx again for your contributions!

Wednesday, November 9, 2011

T-SQL Tuesday #024: A Procedure To Deal With No Procedures

T-SQL TuesdayOh my… Here it is Wednesday, and I’m a day late for T-SQL Tuesday #024, which is hosted this month by some joker named Brad Schulz. Hopefully he’ll understand and accept my late entry… If he doesn’t, then I’ll just have to make him understand.

The theme for this month is Prox ‘n’ Funx (aka Procedures and Functions)… and I’m going to talk about a procedure that I wrote to help me deal with the lack of procedures.

You probably had to re-read that latest sentence, didn’t you? Well, this whole blog post might turn your brain inside-out a bit, so you might want to read it slowly or be prepared to re-read it after you finish it. My apologies in advance.

Here’s the situation… purely hypothetical, you understand… (wink, wink).

Imagine yourself faced with a SQL Server Database that is just a big dumb file cabinet. All it holds is data, and there isn’t a single procedure or function to be found… every single query that goes to the system is an ad-hoc query constructed within an enormous .NET application.

For example, a typical query in the C# code may be put together like this:

string SQL = "";
SQL +=
" SELECT soh.SalesOrderNumber ";
SQL +=
" ,soh.ShipDate ";
SQL +=
" ,soh.CustomerID ";
SQL +=
" ,s.Name ";
SQL +=
" ,soh.TotalDue ";
SQL +=
" FROM Sales.SalesOrderHeader soh ";
SQL +=
" JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID ";
SQL +=
" JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID ";
SQL +=
" JOIN Sales.Store s ON soh.CustomerID=s.CustomerID ";
SQL +=
" WHERE soh.SalesOrderID IN ";
SQL +=
" (SELECT SalesOrderID ";
SQL +=
" FROM Sales.SalesOrderDetail ";
SQL +=
" WHERE ProductID=" + SQLUtils.SQLValue(ProdID) + ") ";
SQL +=
" AND soh.ShipDate>=" + SQLUtils.SQLValue(FromDate);
SQL +=
" AND soh.ShipDate<" + SQLUtils.SQLValue(ToDate);
SQL +=
" AND a.City=" + SQLUtils.SQLValue(CityName);
SQL +=
" AND sp.Name=" + SQLUtils.SQLValue(StateProvinceName);
DataTable dtResult = SQLUtils.ExecSQLQuery(SQL)
Never mind the thoughts that are going through your head regarding readability. Never mind the maintenance nightmare. Never mind the potential for SQL Injection. Never mind the…

Oh never mind.

So hundreds of thousands of ad-hoc queries get sent to the system all day long.

My job? Make things go faster.

Okay, fine. However, there’s one other wrinkle. This particular database operates under Forced Parameterization. This may reduce the frequency of query compilations, but it makes my job a little more complicated.

Let me illustrate… Let’s set the AdventureWorks database to use Forced Parameterization:

alter database AdventureWorks set parameterization forced
With that in place, every ad-hoc query sent to the system is parameterized… In other words, any literal within the ad-hoc query is converted into a parameter and the query as a whole is parameterized so that its plan can be reused by any subsequent ad-hoc query that has the same “code shape”.

Let’s execute the following query in AdventureWorks:

SELECT soh.SalesOrderNumber
,soh.ShipDate
,soh.CustomerID
,s.Name
,soh.TotalDue
FROM Sales.SalesOrderHeader soh
JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID
JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID
JOIN Sales.Store s ON soh.CustomerID=s.CustomerID
WHERE soh.SalesOrderID IN (SELECT SalesOrderID
FROM Sales.SalesOrderDetail
WHERE ProductID=897)
AND soh.ShipDate>='20030801'
AND soh.ShipDate<'20030901'
AND a.City='Milsons Point'
AND sp.Name='New South Wales'
(Those of you who read my blog regularly may be shocked by my use of UPPER CASE keywords… I never do that… But I’m doing it here to illustrate a point).

As you may know, there is a DMV called sys.dm_exec_cached_plans, which contains information about all the query plans that have been cached, along with a plan_handle column that you can use to acquire the actual text of the query that created the plan. There’s also a DMV called sys.dm_exec_query_stats, which contains performance statistics information for cached plans, but its plan_handle (or sql_handle) column points to a different kind of text when you’re dealing with Forced Parameterization.

Here is a simplified version of a Performance Statistics Query that I run that uses sys.dm_exec_query_stats to acquire performance information so that I can find plans that perform a lot of reads (the ones with the most reads are at the top).:

select Code=(select [processing-instruction(q)]=':'+nchar(13)+qt.text+nchar(13)
for xml path(''),type)
,qp.Query_Plan
,[#Reads]=qs.total_logical_reads
,[#Execs]=qs.execution_count
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(plan_handle) qt
cross apply sys.dm_exec_query_plan(plan_handle) qp
where qt.text not like '%sys.dm\_%' escape '\'
order by [#Reads] desc
Here’s what the result looks like in SSMS:



In case you’re wondering about that “processing-instruction” and XML gibberish for the generation of the first column, you can read more about that here. But, in short, you can see that it creates a hyperlink to the query text. When I click on that link, I can see that the text looks like this:

(@0 int,@1 varchar(8000),@2 varchar(8000),
@3 varchar(8000),@4 varchar(8000))select soh . SalesOrderNumber ,
soh . ShipDate , soh . CustomerID , s . Name , soh . TotalDue
from Sales . SalesOrderHeader soh join Person . Address a on
soh . ShipToAddressID = a . AddressID join Person . StateProvince sp
on a . StateProvinceID = sp . StateProvinceID join Sales . Store s
on soh . CustomerID = s . CustomerID where soh . SalesOrderID in
( select SalesOrderID from Sales . SalesOrderDetail where ProductID
= @0 ) and soh . ShipDate > = @1 and soh . ShipDate < @2 and
a . City = @3 and sp . Name = @4
I have split it into multiple lines for this blog post… In reality it’s one looooonnnng string. But you can see that this is NOT the actual text of the query that I executed. In parameterizing this query, SQL converted all my UPPER CASE keywords into lower case, and it got rid of any extraneous white space (multiple spaces, tabs, carriage returns, line feeds), and it inserted a single space between all commas and periods/full-stops and operators.

Now when another query comes along that is exactly like the first query (except querying for different values)…

SELECT soh.SalesOrderNumber
,soh.ShipDate
,soh.CustomerID
,s.Name
,soh.TotalDue
FROM Sales.SalesOrderHeader soh
JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID
JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID
JOIN Sales.Store s ON soh.CustomerID=s.CustomerID
WHERE soh.SalesOrderID IN (SELECT SalesOrderID
FROM Sales.SalesOrderDetail
WHERE ProductID=870)
AND soh.ShipDate>='20030101'
AND soh.ShipDate<'20040101'
AND a.City='London'
AND sp.Name='England'
…and we once again execute our Performance Statistics Query to look at the plan cache...



…we see that the same parameterized query has been reused (note the #Execs is equal to 2).

(Those of you with a quick eye may have also noticed that the #Reads skyrocketed… I’ll talk about that in a later post).

So my problem is this: In order to improve the performance of the application, I can use my Performance Statistics Query in order to find those plans that exhibited the most reads and see if they can be tuned via either the addition of an index or via a recommendation of rewriting the query a different way.

But unfortunately all I get is that parameterized version of the query text. I have no indication what the ACTUAL ad-hoc queries were that were sent to the system. Yes, I can see the text of the ad-hoc queries via the sys.dm_exec_cached_plans DMV, but how do I find the exact ones that have to do with this parameterized query? It’s like looking for a needle in a haystack.

So I wrote a procedure to do that for me. It essentially takes the parameterized query text and creates a LIKE pattern out of it so that I can find its original ad-hoc code via the plan_handle of the sys.dm_exec_cached_plans DMV.

I create this procedure in a database called DBATools (which houses other utilities and data that I use for monitoring the server). It starts like so, receiving the text of the parameterized query code and putting it into a local @Text variable:

use DBATools
go

create procedure GetAdhocQueries
@ParameterizedQuery
nvarchar(max)
as

declare @Text nvarchar(max)
set @Text=@ParameterizedQuery
Its first step is to get rid of the parameter declarations at the beginning of the query. Well, luckily, all these parameterized queries are of the pattern “(@0 … )xxxxx”, where “xxxxx” is one of the following: SELECT or UPDATE or INSERT or DELETE or MERGE or WITH (in the case of a CTE). So let’s use that knowledge to strip off the parameter list:

/*Get rid of parameter list*/
select @Text=substring(@Text,isnull(min(Pos),0)+1,len(@Text))
from (select charindex(N')select ',@Text)
union all
select charindex(N')insert ',@Text)
union all
select charindex(N')update ',@Text)
union all
select charindex(N')delete ',@Text)
union all
select charindex(N')merge ',@Text)
union all
select charindex(N')with ',@Text)) x(Pos)
where Pos>0
So that makes our @Text variable looks like the following, stripped of its parameter list (again, I’m artificially word-wrapping the text here… it’s just one loonng string):

select soh . SalesOrderNumber , soh . ShipDate , 
soh . CustomerID , s . Name , soh . TotalDue
from Sales . SalesOrderHeader soh join Person . Address a on
soh . ShipToAddressID = a . AddressID join Person . StateProvince sp
on a . StateProvinceID = sp . StateProvinceID join Sales . Store s
on soh . CustomerID = s . CustomerID where soh . SalesOrderID in
( select SalesOrderID from Sales . SalesOrderDetail where ProductID
= @0 ) and soh . ShipDate > = @1 and soh . ShipDate < @2 and
a . City = @3 and sp . Name = @4
Next I need to find all the parameters in the text. They are all surrounded with a single space, so they are easy to find because they have the pattern “ @[0-9] ” (for a single digit parameter) or “ @[0-9][0-9] ” (for a two-digit parameter), etc. So the following will find all 4-digit, 3-digit, 2-digit, and 1-digit parameters in the text and substitute them with a single percent sign (%):

/*Substitute all parameters (i.e. @0 or @1 or ...) with Percents (%)*/
declare @NumDigs tinyint
,@Pos int
set
@NumDigs=5
while @NumDigs>1
begin
set @NumDigs=@NumDigs-1
while 1=1
begin --Continue substituting until there are no more
set @Pos=patindex(N'% @'+replicate(N'[0-9]',@NumDigs)+N' %',@Text)
if @Pos=0 break --No more... we're done with the loop
set @Text=stuff(@Text,@Pos,3+@NumDigs,N' % ')
end
end
And now our @Text variable looks like this:

select soh . SalesOrderNumber , soh . ShipDate , 
soh . CustomerID , s . Name , soh . TotalDue
from Sales . SalesOrderHeader soh join Person . Address a on
soh . ShipToAddressID = a . AddressID join Person . StateProvince sp
on a . StateProvinceID = sp . StateProvinceID join Sales . Store s
on soh . CustomerID = s . CustomerID where soh . SalesOrderID in
( select SalesOrderID from Sales . SalesOrderDetail where ProductID
= % ) and soh . ShipDate > = % and soh . ShipDate < % and
a . City = % and sp . Name = %
Now, I need to take care of those spaces that had been inserted by the optimizer between every operator and keyword in the query. So I replace all those spaces with percent signs. At the same time, I also replace any normal pattern-matching characters (like a carat or left square bracket or right square bracket) with percent signs. I don’t want them screwing up my final query into the plan cache. Finally, I place a percent sign at the beginning and the end of the text:

/*Replace all spaces and pattern characters (i.e. "[", "]", "^") with Percents (%)*/
/*Also place a Percent at the beginning and the end*/
set @Text=N'%'
+replace(replace(replace(replace(@Text
,N' '
,N'%')
,N'['
,N'%')
,N']'
,N'%')
,N'^'
,N'%')
+N'%'
At this point our @Text variable looks like this:

%select%soh%.%SalesOrderNumber%,%soh%.%ShipDate%,%
soh%.%CustomerID%,%s%.%Name%,%soh%.%TotalDue%
from%Sales%.%SalesOrderHeader%soh%join%Person%.%Address%a%on%
soh%.%ShipToAddressID%=%a%.%AddressID%join%Person%.%StateProvince%sp%
on%a%.%StateProvinceID%=%sp%.%StateProvinceID%join%Sales%.%Store%s%
on%soh%.%CustomerID%=%s%.%CustomerID%where%soh%.%SalesOrderID%in%
(%select%SalesOrderID%from%Sales%.%SalesOrderDetail%where%ProductID%
=%%%)%and%soh%.%ShipDate%>%=%%%and%soh%.%ShipDate%<%%%and%
a%.%City%=%%%and%sp%.%Name%=%%%
Just to be neat and tidy, I get rid of all the multiple percent signs and replace them with a single one:

/*Transform all multiple Percents into single ones*/
set @Text=replace(@Text,N'%%%',N'%%')
set @Text=replace(@Text,N'%%',N'%')
And that results in this:

%select%soh%.%SalesOrderNumber%,%soh%.%ShipDate%,%
soh%.%CustomerID%,%s%.%Name%,%soh%.%TotalDue%
from%Sales%.%SalesOrderHeader%soh%join%Person%.%Address%a%on%
soh%.%ShipToAddressID%=%a%.%AddressID%join%Person%.%StateProvince%sp%
on%a%.%StateProvinceID%=%sp%.%StateProvinceID%join%Sales%.%Store%s%
on%soh%.%CustomerID%=%s%.%CustomerID%where%soh%.%SalesOrderID%in%
(%select%SalesOrderID%from%Sales%.%SalesOrderDetail%where%ProductID%
=%)%and%soh%.%ShipDate%>%=%and%soh%.%ShipDate%<%and%
a%.%City%=%and%sp%.%Name%=%
Finally, since LIKE patterns are not allowed to be more than 4000 characters, I truncate the @Text variable to that length if needed:

/*Truncate to 4000 characters max*/
if len(@Text)>4000 set @Text=left(@Text,3999)+N'%'
So finally, at this point, we can now find all the ad-hoc queries in the cache whose text is LIKE our @Text variable:

/*Find our Adhoc queries that match the pattern*/
select Code=(select [processing-instruction(q)]=N':'+nchar(13)+qt.text+nchar(13)
for xml path(''),type)
,DB=db_name(QueryDatabaseID)
,[#Bytes]=cp.size_in_bytes
,cp.usecounts
from sys.dm_exec_cached_plans cp
cross apply (select QueryDatabaseID=convert(int,value)
from sys.dm_exec_plan_attributes(cp.plan_handle)
where attribute='dbid') F_DB
cross apply sys.dm_exec_sql_text(cp.plan_handle) qt
where objtype='Adhoc'
and qt.text like @Text
order by cp.usecounts desc
Here is the result:



And when we click on one of the hyperlinks, we can see the original ad-hoc query in all its glory, just as it was originally submitted.

<?q :
SELECT soh.SalesOrderNumber
,soh.ShipDate
,soh.CustomerID
,s.Name
,soh.TotalDue
FROM Sales.SalesOrderHeader soh
JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID
JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID
JOIN Sales.Store s ON soh.CustomerID=s.CustomerID
WHERE soh.SalesOrderID IN (SELECT SalesOrderID
FROM Sales.SalesOrderDetail
WHERE ProductID=870)
AND soh.ShipDate>='20030101'
AND soh.ShipDate<'20040101'
AND a.City='London'
AND sp.Name='England'

?>
So, to take advantage of this new procedure, I just incorporate it into my original Performance Statistics Query, like so:

select Code=(select [processing-instruction(q)]=':'+nchar(13)+qt.text+nchar(13)
for xml path(''),type)
,qp.Query_Plan
,[#Reads]=qs.total_logical_reads
,[#Execs]=qs.execution_count
,AdHocStmts
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(plan_handle) qt
cross apply sys.dm_exec_query_plan(plan_handle) qp
cross apply
(select AdHocStmts=case
when qt.text like '(@0 %'
then N'exec DBATools.dbo.GetAdhocQueries '''
+replace(qt.text,N'''',N'''''')+N' '''
else null
end) F_Adhoc
where qt.text not like '%sys.dm\_%' escape '\'
order by [#Reads] desc
In short, if the query text (in sys.dm_exec_sql_text) starts with a “(@0 “, I know it’s a parameterized query, so I will have the AdHocStmts column populated with the text to call my GetAdhocQueries procedure. When I execute the above query, I get the following result with the new AdHocStmts column:



And I can copy/paste the AdHocStmts column into a query window and execute it (again, I only word-wrap it here for clarity… it’s actually a looonng single line of text).

exec DBATools.dbo.GetAdhocQueries '(@0 int,@1 varchar(8000),@2 varchar(8000),
@3 varchar(8000),@4 varchar(8000))select soh . SalesOrderNumber , soh .
ShipDate , soh . CustomerID , s . Name , soh . TotalDue from Sales .
SalesOrderHeader soh join Person . Address a on soh . ShipToAddressID =
a . AddressID join Person . StateProvince sp on a . StateProvinceID =
sp . StateProvinceID join Sales . Store s on soh . CustomerID = s .
CustomerID where soh . SalesOrderID in ( select SalesOrderID from
Sales . SalesOrderDetail where ProductID = @0 ) and soh . ShipDate
> = @1 and soh . ShipDate < @2 and a . City = @3 and sp . Name = @4 '
And that will find the ad-hoc queries that are associated with that particular parameterized query.

So this GetAdhocQueries procedure allows me to find the ad-hoc query needles in the plan cache haystack. Now if I only had a way to find those $#&@ queries in the C# code as easily… oh well, that’s somebody else’s problem.