Oh my… Here it is Wednesday, and I’m a day late for T-SQL Tuesday #024, which is hosted this month by some joker named
Brad Schulz. Hopefully he’ll understand and accept my late entry… If he doesn’t, then I’ll just have to
make him understand.
The theme for this month is Prox ‘n’ Funx (aka Procedures and Functions)… and I’m going to talk about a procedure that I wrote to help me deal with the lack of procedures.
You probably had to re-read that latest sentence, didn’t you? Well, this whole blog post might turn your brain inside-out a bit, so you might want to read it slowly or be prepared to re-read it after you finish it. My apologies in advance.
Here’s the situation… purely hypothetical, you understand… (wink, wink).
Imagine yourself faced with a SQL Server Database that is just a big dumb file cabinet. All it holds is data, and there isn’t a single procedure or function to be found… every single query that goes to the system is an ad-hoc query constructed within an enormous .NET application.
For example, a typical query in the C# code may be put together like this:
string SQL = "";
SQL += " SELECT soh.SalesOrderNumber ";
SQL += " ,soh.ShipDate ";
SQL += " ,soh.CustomerID ";
SQL += " ,s.Name ";
SQL += " ,soh.TotalDue ";
SQL += " FROM Sales.SalesOrderHeader soh ";
SQL += " JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID ";
SQL += " JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID ";
SQL += " JOIN Sales.Store s ON soh.CustomerID=s.CustomerID ";
SQL += " WHERE soh.SalesOrderID IN ";
SQL += " (SELECT SalesOrderID ";
SQL += " FROM Sales.SalesOrderDetail ";
SQL += " WHERE ProductID=" + SQLUtils.SQLValue(ProdID) + ") ";
SQL += " AND soh.ShipDate>=" + SQLUtils.SQLValue(FromDate);
SQL += " AND soh.ShipDate<" + SQLUtils.SQLValue(ToDate);
SQL += " AND a.City=" + SQLUtils.SQLValue(CityName);
SQL += " AND sp.Name=" + SQLUtils.SQLValue(StateProvinceName);
DataTable dtResult = SQLUtils.ExecSQLQuery(SQL)
Never mind the thoughts that are going through your head regarding readability. Never mind the maintenance nightmare. Never mind the potential for SQL Injection. Never mind the…
Oh never mind.
So hundreds of thousands of ad-hoc queries get sent to the system all day long.
My job? Make things go faster.
Okay, fine. However, there’s one other wrinkle. This particular database operates under Forced Parameterization. This may reduce the frequency of query compilations, but it makes my job a little more complicated.
Let me illustrate… Let’s set the AdventureWorks database to use Forced Parameterization:
alter database AdventureWorks set parameterization forced
With that in place, every ad-hoc query sent to the system is parameterized… In other words, any literal within the ad-hoc query is converted into a parameter and the query as a whole is parameterized so that its plan can be reused by any subsequent ad-hoc query that has the same “code shape”.
Let’s execute the following query in AdventureWorks:
SELECT soh.SalesOrderNumber
,soh.ShipDate
,soh.CustomerID
,s.Name
,soh.TotalDue
FROM Sales.SalesOrderHeader soh
JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID
JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID
JOIN Sales.Store s ON soh.CustomerID=s.CustomerID
WHERE soh.SalesOrderID IN (SELECT SalesOrderID
FROM Sales.SalesOrderDetail
WHERE ProductID=897)
AND soh.ShipDate>='20030801'
AND soh.ShipDate<'20030901'
AND a.City='Milsons Point'
AND sp.Name='New South Wales'
(Those of you who read my blog regularly may be shocked by my use of UPPER CASE keywords… I never do that… But I’m doing it here to illustrate a point).
As you may know, there is a DMV called sys.dm_exec_cached_plans, which contains information about all the query plans that have been cached, along with a plan_handle column that you can use to acquire the actual text of the query that created the plan. There’s also a DMV called sys.dm_exec_query_stats, which contains performance statistics information for cached plans, but its plan_handle (or sql_handle) column points to a
different kind of text when you’re dealing with Forced Parameterization.
Here is a simplified version of a Performance Statistics Query that I run that uses sys.dm_exec_query_stats to acquire performance information so that I can find plans that perform a lot of reads (the ones with the most reads are at the top).:
select Code=(select [processing-instruction(q)]=':'+nchar(13)+qt.text+nchar(13)
for xml path(''),type)
,qp.Query_Plan
,[#Reads]=qs.total_logical_reads
,[#Execs]=qs.execution_count
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(plan_handle) qt
cross apply sys.dm_exec_query_plan(plan_handle) qp
where qt.text not like '%sys.dm\_%' escape '\'
order by [#Reads] desc
Here’s what the result looks like in SSMS:
In case you’re wondering about that “processing-instruction” and XML gibberish for the generation of the first column, you can read more about that
here. But, in short, you can see that it creates a hyperlink to the query text. When I click on that link, I can see that the text looks like this:
(@0 int,@1 varchar(8000),@2 varchar(8000),
@3 varchar(8000),@4 varchar(8000))select soh . SalesOrderNumber ,
soh . ShipDate , soh . CustomerID , s . Name , soh . TotalDue
from Sales . SalesOrderHeader soh join Person . Address a on
soh . ShipToAddressID = a . AddressID join Person . StateProvince sp
on a . StateProvinceID = sp . StateProvinceID join Sales . Store s
on soh . CustomerID = s . CustomerID where soh . SalesOrderID in
( select SalesOrderID from Sales . SalesOrderDetail where ProductID
= @0 ) and soh . ShipDate > = @1 and soh . ShipDate < @2 and
a . City = @3 and sp . Name = @4
I have split it into multiple lines for this blog post… In reality it’s one looooonnnng string. But you can see that this is NOT the actual text of the query that I executed. In parameterizing this query, SQL converted all my UPPER CASE keywords into lower case, and it got rid of any extraneous white space (multiple spaces, tabs, carriage returns, line feeds), and it inserted a single space between all commas and periods/full-stops and operators.
Now when another query comes along that is exactly like the first query (except querying for different values)…
SELECT soh.SalesOrderNumber
,soh.ShipDate
,soh.CustomerID
,s.Name
,soh.TotalDue
FROM Sales.SalesOrderHeader soh
JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID
JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID
JOIN Sales.Store s ON soh.CustomerID=s.CustomerID
WHERE soh.SalesOrderID IN (SELECT SalesOrderID
FROM Sales.SalesOrderDetail
WHERE ProductID=870)
AND soh.ShipDate>='20030101'
AND soh.ShipDate<'20040101'
AND a.City='London'
AND sp.Name='England'
…and we once again execute our Performance Statistics Query to look at the plan cache...
…we see that the same parameterized query has been reused (note the #Execs is equal to 2).
(Those of you with a quick eye may have also noticed that the #Reads skyrocketed… I’ll talk about that in a later post).
So my problem is this: In order to improve the performance of the application, I can use my Performance Statistics Query in order to find those plans that exhibited the most reads and see if they can be tuned via either the addition of an index or via a recommendation of rewriting the query a different way.
But unfortunately all I get is that parameterized version of the query text. I have no indication what the ACTUAL ad-hoc queries were that were sent to the system. Yes, I can see the text of the ad-hoc queries via the sys.dm_exec_cached_plans DMV, but how do I find the
exact ones that have to do with
this parameterized query? It’s like looking for a needle in a haystack.
So I wrote a procedure to do that for me. It essentially takes the parameterized query text and creates a LIKE pattern out of it so that I can find its original ad-hoc code via the plan_handle of the sys.dm_exec_cached_plans DMV.
I create this procedure in a database called DBATools (which houses other utilities and data that I use for monitoring the server). It starts like so, receiving the text of the parameterized query code and putting it into a local @Text variable:
use DBATools
go
create procedure GetAdhocQueries
@ParameterizedQuery nvarchar(max)
as
declare @Text nvarchar(max)
set @Text=@ParameterizedQuery
Its first step is to get rid of the parameter declarations at the beginning of the query. Well, luckily, all these parameterized queries are of the pattern “(@0 … )xxxxx”, where “xxxxx” is one of the following: SELECT or UPDATE or INSERT or DELETE or MERGE or WITH (in the case of a CTE). So let’s use that knowledge to strip off the parameter list:
/*Get rid of parameter list*/
select @Text=substring(@Text,isnull(min(Pos),0)+1,len(@Text))
from (select charindex(N')select ',@Text)
union all
select charindex(N')insert ',@Text)
union all
select charindex(N')update ',@Text)
union all
select charindex(N')delete ',@Text)
union all
select charindex(N')merge ',@Text)
union all
select charindex(N')with ',@Text)) x(Pos)
where Pos>0
So that makes our @Text variable looks like the following, stripped of its parameter list (again, I’m artificially word-wrapping the text here… it’s just one loonng string):
select soh . SalesOrderNumber , soh . ShipDate ,
soh . CustomerID , s . Name , soh . TotalDue
from Sales . SalesOrderHeader soh join Person . Address a on
soh . ShipToAddressID = a . AddressID join Person . StateProvince sp
on a . StateProvinceID = sp . StateProvinceID join Sales . Store s
on soh . CustomerID = s . CustomerID where soh . SalesOrderID in
( select SalesOrderID from Sales . SalesOrderDetail where ProductID
= @0 ) and soh . ShipDate > = @1 and soh . ShipDate < @2 and
a . City = @3 and sp . Name = @4
Next I need to find all the parameters in the text. They are all surrounded with a single space, so they are easy to find because they have the pattern “ @[0-9] ” (for a single digit parameter) or “ @[0-9][0-9] ” (for a two-digit parameter), etc. So the following will find all 4-digit, 3-digit, 2-digit, and 1-digit parameters in the text and substitute them with a single percent sign (%):
/*Substitute all parameters (i.e. @0 or @1 or ...) with Percents (%)*/
declare @NumDigs tinyint
,@Pos int
set @NumDigs=5
while @NumDigs>1
begin
set @NumDigs=@NumDigs-1
while 1=1
begin --Continue substituting until there are no more
set @Pos=patindex(N'% @'+replicate(N'[0-9]',@NumDigs)+N' %',@Text)
if @Pos=0 break --No more... we're done with the loop
set @Text=stuff(@Text,@Pos,3+@NumDigs,N' % ')
end
end
And now our @Text variable looks like this:
select soh . SalesOrderNumber , soh . ShipDate ,
soh . CustomerID , s . Name , soh . TotalDue
from Sales . SalesOrderHeader soh join Person . Address a on
soh . ShipToAddressID = a . AddressID join Person . StateProvince sp
on a . StateProvinceID = sp . StateProvinceID join Sales . Store s
on soh . CustomerID = s . CustomerID where soh . SalesOrderID in
( select SalesOrderID from Sales . SalesOrderDetail where ProductID
= % ) and soh . ShipDate > = % and soh . ShipDate < % and
a . City = % and sp . Name = %
Now, I need to take care of those spaces that had been inserted by the optimizer between every operator and keyword in the query. So I replace all those spaces with percent signs. At the same time, I also replace any normal pattern-matching characters (like a carat or left square bracket or right square bracket) with percent signs. I don’t want them screwing up my final query into the plan cache. Finally, I place a percent sign at the beginning and the end of the text:
/*Replace all spaces and pattern characters (i.e. "[", "]", "^") with Percents (%)*/
/*Also place a Percent at the beginning and the end*/
set @Text=N'%'
+replace(replace(replace(replace(@Text
,N' '
,N'%')
,N'['
,N'%')
,N']'
,N'%')
,N'^'
,N'%')
+N'%'
At this point our @Text variable looks like this:
%select%soh%.%SalesOrderNumber%,%soh%.%ShipDate%,%
soh%.%CustomerID%,%s%.%Name%,%soh%.%TotalDue%
from%Sales%.%SalesOrderHeader%soh%join%Person%.%Address%a%on%
soh%.%ShipToAddressID%=%a%.%AddressID%join%Person%.%StateProvince%sp%
on%a%.%StateProvinceID%=%sp%.%StateProvinceID%join%Sales%.%Store%s%
on%soh%.%CustomerID%=%s%.%CustomerID%where%soh%.%SalesOrderID%in%
(%select%SalesOrderID%from%Sales%.%SalesOrderDetail%where%ProductID%
=%%%)%and%soh%.%ShipDate%>%=%%%and%soh%.%ShipDate%<%%%and%
a%.%City%=%%%and%sp%.%Name%=%%%
Just to be neat and tidy, I get rid of all the multiple percent signs and replace them with a single one:
/*Transform all multiple Percents into single ones*/
set @Text=replace(@Text,N'%%%',N'%%')
set @Text=replace(@Text,N'%%',N'%')
And that results in this:
%select%soh%.%SalesOrderNumber%,%soh%.%ShipDate%,%
soh%.%CustomerID%,%s%.%Name%,%soh%.%TotalDue%
from%Sales%.%SalesOrderHeader%soh%join%Person%.%Address%a%on%
soh%.%ShipToAddressID%=%a%.%AddressID%join%Person%.%StateProvince%sp%
on%a%.%StateProvinceID%=%sp%.%StateProvinceID%join%Sales%.%Store%s%
on%soh%.%CustomerID%=%s%.%CustomerID%where%soh%.%SalesOrderID%in%
(%select%SalesOrderID%from%Sales%.%SalesOrderDetail%where%ProductID%
=%)%and%soh%.%ShipDate%>%=%and%soh%.%ShipDate%<%and%
a%.%City%=%and%sp%.%Name%=%
Finally, since LIKE patterns are not allowed to be more than 4000 characters, I truncate the @Text variable to that length if needed:
/*Truncate to 4000 characters max*/
if len(@Text)>4000 set @Text=left(@Text,3999)+N'%'
So finally, at this point, we can now find all the ad-hoc queries in the cache whose text is LIKE our @Text variable:
/*Find our Adhoc queries that match the pattern*/
select Code=(select [processing-instruction(q)]=N':'+nchar(13)+qt.text+nchar(13)
for xml path(''),type)
,DB=db_name(QueryDatabaseID)
,[#Bytes]=cp.size_in_bytes
,cp.usecounts
from sys.dm_exec_cached_plans cp
cross apply (select QueryDatabaseID=convert(int,value)
from sys.dm_exec_plan_attributes(cp.plan_handle)
where attribute='dbid') F_DB
cross apply sys.dm_exec_sql_text(cp.plan_handle) qt
where objtype='Adhoc'
and qt.text like @Text
order by cp.usecounts desc
Here is the result:
And when we click on one of the hyperlinks, we can see the original ad-hoc query in all its glory, just as it was originally submitted.
<?q :
SELECT soh.SalesOrderNumber
,soh.ShipDate
,soh.CustomerID
,s.Name
,soh.TotalDue
FROM Sales.SalesOrderHeader soh
JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID
JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID
JOIN Sales.Store s ON soh.CustomerID=s.CustomerID
WHERE soh.SalesOrderID IN (SELECT SalesOrderID
FROM Sales.SalesOrderDetail
WHERE ProductID=870)
AND soh.ShipDate>='20030101'
AND soh.ShipDate<'20040101'
AND a.City='London'
AND sp.Name='England'
?>
So, to take advantage of this new procedure, I just incorporate it into my original Performance Statistics Query, like so:
select Code=(select [processing-instruction(q)]=':'+nchar(13)+qt.text+nchar(13)
for xml path(''),type)
,qp.Query_Plan
,[#Reads]=qs.total_logical_reads
,[#Execs]=qs.execution_count
,AdHocStmts
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(plan_handle) qt
cross apply sys.dm_exec_query_plan(plan_handle) qp
cross apply
(select AdHocStmts=case
when qt.text like '(@0 %'
then N'exec DBATools.dbo.GetAdhocQueries '''
+replace(qt.text,N'''',N'''''')+N' '''
else null
end) F_Adhoc
where qt.text not like '%sys.dm\_%' escape '\'
order by [#Reads] desc
In short, if the query text (in sys.dm_exec_sql_text) starts with a “(@0 “, I know it’s a parameterized query, so I will have the AdHocStmts column populated with the text to call my GetAdhocQueries procedure. When I execute the above query, I get the following result with the new AdHocStmts column:
And I can copy/paste the AdHocStmts column into a query window and execute it (again, I only word-wrap it here for clarity… it’s actually a looonng single line of text).
exec DBATools.dbo.GetAdhocQueries '(@0 int,@1 varchar(8000),@2 varchar(8000),
@3 varchar(8000),@4 varchar(8000))select soh . SalesOrderNumber , soh .
ShipDate , soh . CustomerID , s . Name , soh . TotalDue from Sales .
SalesOrderHeader soh join Person . Address a on soh . ShipToAddressID =
a . AddressID join Person . StateProvince sp on a . StateProvinceID =
sp . StateProvinceID join Sales . Store s on soh . CustomerID = s .
CustomerID where soh . SalesOrderID in ( select SalesOrderID from
Sales . SalesOrderDetail where ProductID = @0 ) and soh . ShipDate
> = @1 and soh . ShipDate < @2 and a . City = @3 and sp . Name = @4 '
And that will find the ad-hoc queries that are associated with that particular parameterized query.
So this GetAdhocQueries procedure allows me to find the ad-hoc query needles in the plan cache haystack. Now if I only had a way to find those $#&@ queries in the C# code as easily… oh well, that’s somebody else’s problem.