Tuesday, February 9, 2010

Dear FROM Clause...

This blog entry is participating in T-SQL Tuesday #003, hosted this month by Rob Farley. You are invited to visit his blog to join the party and read more blogs participating in this month’s theme: Relationships.

Now please sit back and relax, get some tissue, and read on…



Dear John...Dear FROM Clause,

I know that Valentine’s Day is less than a week away, but I felt it was important to write you this letter… I had to put my feelings down on paper. I don’t know quite how to tell you this, but… well… our relationship cannot continue.

It’s just too confusing and complex, and there have been times when it has become… dare I say it?… dangerous.

I still think back to when we were first introduced and we went to NorthWind and AdventureWorks together and started to do SELECTs.

use AdventureWorks
select FirstName,LastName
from Person.Contact
Those were fun times… we’d just hang out with your pals WHERE and ORDER BY and sometimes GROUP BY would tag along... there were always a lot of laughs when HAVING showed up… he’s a riot! Occasionally CROSS APPLY would come too… I had a little trouble understanding him at first, but now I think he’s the coolest guy around! I’ve written about him in my online diary several times.

And of course, your table friends would always JOIN the party… and sometimes it would be quite a wild party:

/* 
Suggest 3 alternate Clothing/Accessory Products
to the @CustomerID that (s)he has never bought before
when (s)he is looking at a particular @ProductID.
(Customers who bought this item also bought...)
See bradsruminations.blogspot.com/2009/07/customers-who-bought-this-item-also.html
*/
use AdventureWorks
select top (3) SuggProdID=d2.ProductID
from Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1 on d1.SalesOrderID=h1.SalesOrderID
join Sales.Customer c on h1.CustomerID=c.CustomerID
join Sales.SalesOrderHeader h2 on h1.CustomerID=h2.CustomerID
join Sales.SalesOrderDetail d2 on h2.SalesOrderID=d2.SalesOrderID
join Production.Product p on d2.ProductID=p.ProductID
join Production.ProductSubCategory sc on p.ProductSubCategoryID=sc.ProductSubCategoryID
where d1.ProductID=@ProductID
and h1.CustomerID<>@CustomerID
and c.CustomerType='I'
and p.ProductSubCategoryID<>(select ProductSubCategoryID
from Production.Product
where ProductID=@ProductID)
and sc.ProductCategoryID in (3,4)
and d2.ProductID not in (select ProductID
from Sales.SalesOrderDetail sod
join Sales.SalesOrderHeader soh
on sod.SalesOrderID=soh.SalesOrderID
where CustomerID=@CustomerID)
group by d2.ProductID
having count(distinct h2.SalesOrderID)>5
order by count(distinct h2.SalesOrderID) desc
We were just friends during those times… it was relaxing and fun and there was no pressure. And then…

You asked me out on an UPDATE.

It was a little awkward at first, but after our second and third UPDATEs (and beyond), I started to feel a little more comfortable as we got to know each other better.

/* 
Howard Snyder's company changed its name.
We updated it in the Customers table, so now we need to
update the ShipName in all the company's unshipped orders.
*/
use NorthWind
update Orders
set ShipName=Customers.CompanyName
from Orders
join Customers on Orders.CustomerID=Customers.CustomerID
where Customers.ContactName='Howard Snyder'
and Orders.ShippedDate is null
Then you wanted to have more variety in our UPDATEs… it started to become a little overwhelming and I had to admit, I was starting to get a little confused with our relationship.

/*
The query plans for all following updates
are equivalent in every way
*/
update Orders
set ShipName=Customers.CompanyName
from Orders
join Customers on Orders.CustomerID=Customers.CustomerID
where Customers.ContactName='Howard Snyder'
and Orders.ShippedDate is null

update Orders
set ShipName=Customers.CompanyName
from Customers /* No Orders table in FROM clause */
where Customers.CustomerID=Orders.CustomerID /* But connected via WHERE */
and Customers.ContactName='Howard Snyder'
and Orders.ShippedDate is null

update o /* Update alias of the FROM clause */
set ShipName=c.CompanyName
from Orders o
join Customers c on o.CustomerID=c.CustomerID
where c.ContactName='Howard Snyder'
and o.ShippedDate is null

update Orders /* Note the alias in the FROM clause is ignored here */
set ShipName=c.CompanyName
from Orders as SomeObnoxiousAlias
join Customers c on SomeObnoxiousAlias.CustomerID=c.CustomerID
where c.ContactName='Howard Snyder'
and SomeObnoxiousAlias.ShippedDate is null
I started to get worried, because sometimes it seemed as if our relationship wasn’t going anywhere specific… there was no direction… no specific goals… it was kind of non-deterministic.

/*
Which of ERNSH's 30 orders will end up
updating his Country column?
We really have no idea.
*/
update Customers
set Country=Orders.ShipCountry
from Customers
join Orders on Customers.CustomerID=Orders.CustomerID
where Customers.CustomerID='ERNSH'
And then eventually our relationship moved to the next level.

We started… well, y’know (blush)… doing it… the D-Word… DELETEing.

At first it was kind of exciting but really scary at the same time. And I have to admit that I felt a little guilty about all these new-found feelings of power awakening inside me.

But then you started to come on too strong… you were smothering me… it was like you were there all the time

/* Two FROMs in a DELETE is kind of confusing */
delete from Customers
from Customers
where CustomerID='FISSA'
I needed some space… And I asked you to pull back and not let things go so fast…

/* But the first FROM is optional */
delete Customers
from Customers
where CustomerID='FISSA'

/* And FROM, of course, is not needed if there's only one table */
delete Customers
where CustomerID='FISSA'
Then things got weird. You wanted to get into all this… experimentation. We starting DELETEing in ways that were really unsettling and… unnatural. My SQL Server MVP friends Steve Kass and Aaron Alton warned me about you, but I didn’t listen. It was becoming dangerous, and we started doing things we really shouldn’t be doing… and then one thing would lead to another, and… well I still feel great pangs of guilt and sorrow when I think back on how I had to sometimes abort the query to avoid undesirable results.

/*
The following will delete all of FISSA's orders.
Potentially confusing.
*/
delete Orders
from Orders Customers
where CustomerID='FISSA'

/*
The following will delete the Customer named FISSA.
Really confusing? Yep.
*/
delete Orders
from Customers Orders
where CustomerID='FISSA'

/*
OOPS! The following will end up deleting ALL Order Detail rows.
Notice there's no relationship between the table whose rows we're
deleting and the table referenced in the FROM clause.
As long as there is one single order in the Orders table
with a CustomerID of ERNSH, the Order Details table will be
completely cleared out... *Poof*... Nothing... Nada... Zip.
*/
delete [Order Details]
from Orders
where CustomerID='ERNSH'
And so I’ve come to the decision that we can no longer have a DELETE relationship… ever again. I’m sorry it has to be this way, but I need to have more control of my life. If I have the urge to DELETE, I will just have to be more careful and use protection… I’ll always use EXISTS or IN… our friend WHERE can get them for me.

/*
Delete all Order Details rows that belong
to the Customer ERNSH.
The two queries below are equivalent in every way.
*/
delete [Order Details]
where exists (select *
from Orders
where OrderID=[Order Details].OrderID
and CustomerID='ERNSH')

delete [Order Details]
where OrderID in (select OrderID
from Orders
where CustomerID='ERNSH')
Things won’t be so confusing anymore… my DELETE life will be more straightforward and easier to understand and… well… normal.

/*
Delete all SalesOrderDetail rows that satisfy all of the following:
1) The order was placed by a guy named Richard with a SuperiorCard Credit Card
2) The line item involved a Special Offer of the Reseller Category
3) The line item involved a Product of the Accessories Category
*/
use AdventureWorks
delete Sales.SalesOrderDetail
where SalesOrderID in (select SalesOrderID
from Sales.SalesOrderHeader soh
join Person.Contact c on soh.ContactID=c.ContactID
join Sales.CreditCard cc on soh.CreditCardID=cc.CreditCardID
where c.FirstName='Richard'
and cc.CardType='SuperiorCard')
and SpecialOfferID in (select SpecialOfferID
from Sales.SpecialOffer
where Category='Reseller')
and ProductID in (select ProductID
from Production.Product p
join Production.ProductSubCategory ps
on p.ProductSubCategoryID=ps.ProductSubCategoryID
join Production.ProductCategory pc
on ps.ProductCategoryID=pc.ProductCategoryID
where pc.Name='Accessories')
I’m also considering ending our UPDATEing relationship. I’m becoming more involved in a query organization called CTE and that’s helping me move into another direction.

use NorthWind;
with TargetRows as
(
select ShipName
,NewShipName=Customers.CompanyName
from Orders
join Customers on Orders.CustomerID=Customers.CustomerID
where Customers.ContactName='Howard Snyder'
and Orders.ShippedDate is null
)
update TargetRows
set ShipName=NewShipName
I realize that you will still be peripherally involved in my UPDATEing and DELETEing, watching from the sidelines, but I’ll just have to accept that.

To be brutally honest, I’ve considered breaking off from you completely, but that’s just not possible… We still have to work together, and I’m not about to go out and get another job just to avoid you.

I hope you understand all the things I’ve written in this letter. I’m sure it’s hard reading something like this, but I think over time you will understand and accept my decision.

I’m sure we can still be friends.

Sincerely,

Every SQL Developer in the Universe (I hope)

Thursday, January 28, 2010

Random Sentence Generator

I’ve always been fascinated with language and linguistics… specifically grammar and sentence structure. I was one of those nerds in high school who actually enjoyed diagramming sentences.

Sentence Diagram

For kicks, about 15 years ago, I wrote a self-contained program in Visual FoxPro (VFP) to generate random sentences, but I lost track of it… it’s probably on an old computer gathering dust in the garage.

But a year or so ago, I Googled “random sentence generator” and came upon this website, which describes an old assignment in a Computer Science class at Stanford University.

The assignment was to generate random sentences based on a text file consisting of a grammar. This grammar is a template describing the various combinations of words to form the sentences.

For example, here is a very simple grammar file (Don’t pay any attention to the T-SQL /*…*/ comment delimiters… they’re only there so that websites that syndicate this blog render the text below in a consistent manner):

/*
{<start> <pronoun> <verb>;}
{<pronoun> I;you;he;she;it;we;they;}
{<verb> ran;played;drank <liquid>;}
{<liquid> water;whiskey;Pepsi;}
*/
This grammar file consists of 4 elements, whose definitions are between curly brackets ({}). Each element is defined by a list of possible expansions, each ending with a semicolon (;). These expansions can be individual words or phrases, or they can be (or contain) other elements, which will need to be further expanded.

Beginning with the <start> element, you generate a random sentence, continuously expanding the elements until there are no more to expand, as illustrated by the following steps:

/*
Start with <start>: <start>
Expand <start>: <pronoun> <verb>
Expand <pronoun>: they <verb>
Expand <verb>: they drank <liquid>
Expand <liquid>: they drank whiskey
*/
This grammar file above, though very very simple, is capable of generating 35 different sentences.

There are various grammar files available at this location. Many of them were written for fun by students over the years. If you wish to download any of them, just right-click and choose Save Target As… from the context menu.

For example, there’s a grammar file for generating a random Star Trek episode treatment, like so:

While ferrying a Drysilic ambassador to the nearest J.C. Penny outlet, Wesley Crusher is suddenly swallowed into a time vortex connecting to the year 325. In a flash of insight, someone decides that the best solution is by using Guinan’s hat as a weapon. And thus, the Enterprise continues undaunted in its mission.

And there’s one for generating a random James Bond film synopsis (I inserted a [sic] next to the misspellings… honestly, you’d think Stanford students would spell better!):

A demented Libyan terrorist plots to lower England’s standard of living by wrecking the global economy. Bond tracks his nemesis to a palacial [sic] estate in France and they play cards (cheating shamelessly). Within minutes, Bond meets a stunning CIA agent, whom he thinks is cute but too young. Afterwards, Bond is siezed [sic] by commandos and fed to pirannahs [sic]. Incredibly, he escapes by driving a speedboat over a waterfall. Finally, with only 007 seconds to spare, Bond causes SMERSH headquarters to self-destruct.

And there’s one for generating insults:

You cantankerous bucket of defective lizard scabs.

May a gruesome and sweaty group of South American killer bees gleefully vomit in the toxic dump you call home.

Anyway, once I discovered this website a year ago, I wrote a brand new Random Sentence Generator in VFP, using this grammar file template concept. And I had a little fun generating random insults and random Trek/Bond synopses and other things.

But I didn’t stop there. I put together a grammar file to generate sentences of my own. Over many weeks, I kept adding to it, making it very extensive, filled with dependent clauses, modal auxiliary verbs, conjunctive adverbs, and a host of other complex sentence constructions. I even amended it to handle past and future tense and plurals, which meant I had to handle irregular formations like sit/sat and man/men.

I can pretty much guarantee that you will never get the same sentence generated twice from my grammar file. I haven’t done the math (I’m not really sure how to), but it must be capable of generating literally quadrillions upon quadrillions of different sentences, from the very simple…

Michael sobbed.

…to the moderately complex…

Last week my daughter was upset to discover that all the gophers wanted to gracefully gulp down 42 gallons of armadillo soup.

…to the very complex…

This group of supermodels will become very sumptuous, moderately rich, and positively sleazy next Tuesday, during the time that my sister-in-law’s daughter will look decidedly tough and allegedly creepy, and next Monday, even though a stormtrooper and that surprisingly murderous nurse will haul peanuts across a terribly horrendous ocean, very heavyhearted kangaroos will move out of the pale blue forest.

So, now cut to present day… I recently wrote the same Random Sentence Generator in T-SQL, and it’s available here on my SkyDrive (along with my grammar file and a few of the various others mentioned above) to anyone who wants to fool around with it (in SQL2005 or later).

In order to use it, you must first create a table called, appropriately, Grammar:

use TempDB  /* Change to whatever database you wish */
go
if object_id('Grammar','U') is not null drop table Grammar
go
create table Grammar
(
Descript nvarchar(20)
,Element varchar(40)
,Expansion varchar(200)
)
go
create clustered index Ix_Grammar on Grammar (Descript,Element)
go
The table will house any number of grammar file definitions that you wish to load in. I’ve written a stored procedure called LoadGrammarFile to do this. The code below loads the various grammar files mentioned above (the Nonsense file is the one that I put together):

exec LoadGrammarFile 'StarTrek' , 'c:\grammar\trek.txt'
exec LoadGrammarFile 'Bond' , 'c:\grammar\bond.txt'
exec LoadGrammarFile 'Insult' , 'c:\grammar\insult.txt'
exec LoadGrammarFile 'Nonsense' , 'c:\grammar\nonsense.txt'
So, for example, the last line above loads a grammar with the description of Nonsense from the file c:\grammar\nonsense.txt.

So now that we have our table loaded with our various grammars, we can generate sentences to our heart’s content. The stored procedure called GenerateSentence will do just that, based on the grammar we desire:

exec GenerateSentence 'Nonsense'
/*
Sentence
---------------------------------
Ronald tried to escape from Peru.
*/
The GenerateSentence procedure also accepts a couple of optional arguments. For example, if you know your grammar file well, you can indicate that you want the sentence generation to start with a different element from the <start> element, like so:

exec GenerateSentence 'Nonsense', @Init='<confuciusoption>'
/*
Sentence
-----------------------------------------------------------
Confucius say, "Pessimism is like cabbage, only more sexy."
*/


exec GenerateSentence 'Nonsense', @Init='<bookmovieoption>'
/*
Sentence
-------------------------------------------------------------------------------------
Larry Coolidge's wife checked out a film entitled "Cancer And The Kumquats in Tokyo".
*/


exec GenerateSentence 'Nonsense', @Init='<liquidobject>'
/*
Sentence
-----------------------------
A couple of tomato milkshakes
*/

You can also see how the sentence is generated step-by-step by passing @Debug=1, and that will PRINT each step in the Text Messages window in SSMS:

exec GenerateSentence 'Nonsense', @Init='<liquidobject>', @Debug=1
/*
<liquidobject>
<quantity> <liquidcontainer>s of <liquid>
<digitmorethan0><digit> <liquidcontainer>s of <liquid>
6<digit> <liquidcontainer>s of <liquid>
62 <liquidcontainer>s of <liquid>
62 bowls of <liquid>
62 bowls of <fruit> tea
62 bowls of cantaloupe tea

Sentence
--------------------------
62 bowls of cantaloupe tea
*/
Finally, you can also specify how many sentences to generate by passing a @Quantity value (only a maximum of 100 sentences will be generated):

exec GenerateSentence 'Nonsense', @Init='<liquidobject>', @Quantity=10
/*
Sentence
---------------------------------
448 glasses of Clorox
An almond milkshake
16 buckets of NyQuil
Some blueberry-scented perfume
A few vats of Tabasco sauce
3 gallons of snot
A couple of bowls of apricot soda
Some vomit
846 tanks of antelope soup
7 vials of artichoke oil
*/
The code for the LoadGrammarFile and GenerateSentence procedures is below. Note that they make use of many of the string UDFs that I discussed in my last blog entry.

So have some fun and download it and play with it… Make it part of your daily routine… a “thought for the day” to get you going in the morning.

If you generate any wildly funny or especially profound sentences, I’d love to hear them.

/*----------------------------------------------------------------------------------*/

if object_id('LoadGrammarFile','P') is not null drop procedure LoadGrammarFile
go
create procedure LoadGrammarFile
@GrammarDescript
nvarchar(40)
,@FileName nvarchar(max)
as
begin

set nocount on

declare @SqlCommand nvarchar(max)
,@GrammarDef varchar(max)
,@ElementDef varchar(max)
,@ExpansionList varchar(max)
,@Element varchar(40)
,@Expansion varchar(max)
,@i int
,@j int
/*
Use Dynamic SQL to load the contents of the file into a string
*/
set @SqlCommand=
N'set @GrammarDef=(select *
from openrowset(bulk '''
+@FileName+N''', single_clob) x)'
exec sp_executesql
@SqlCommand
,N'@GrammarDef varchar(max) output'
,@GrammarDef output

/*
Take care of odd stuff seen in various Stanford grammar files
*/
set @GrammarDef=replace(@GrammarDef,'> s ','>s ') /* Plurals */
set @GrammarDef=replace(@GrammarDef,'> ''s ','>''s ') /* Possessives */
set @GrammarDef=replace(@GrammarDef,'> "''s" ','>''s ') /* Possessives */

/*
Clear out any previous contents for the
desired Grammar description from the table
*/
delete Grammar where Descript=@GrammarDescript

/*
Parse through the grammar definition until done
*/
set @i=0
while 1=1
begin
set @i=@i+1
/*
Find the next Element Definition between {} delimiters
and convert any CR or LF or TAB characters to spaces
*/
set @ElementDef=dbo.StrExtract(@GrammarDef,'{','}',@i,0)
set @ElementDef=ltrim(rtrim(dbo.ChrTran(@ElementDef
,char(13)+char(10)+char(9)
,' ')))

if @ElementDef='' break /* We're done! */

/*
Get the name of the Element within the Definition
and get the list of Expansions for that Element
(in other words, the rest of the Definition contents)
*/
set @Element=dbo.StrExtract(@ElementDef,'<','>',1,4)
set @ExpansionList=';'+ltrim(substring(@ElementDef
,len(@Element)+1
,len(@ElementDef)))
/*
Parse through the List of Expansions until done
*/
set @j=0
while 1=1
begin
set @j=@j+1
/*
Find the next Expansion between semicolon delimiters
and include those delimiters in the result.
Note: We do this because a blank Expansion between semicolons
is perfectly valid
*/
set @Expansion=dbo.strExtract(@ExpansionList,';',';',@j,4)
if @Expansion='' break /* We're done! */
/*
Now get rid of those semicolon delimiters
*/
set @Expansion=ltrim(rtrim(replace(@Expansion,';','')))
/*
There may be some Expansions where we actually do want a semicolon
and we represent that with a double-colon (::), so change
those to actual semicolons
*/
set @Expansion=replace(@Expansion,'::',';')
/*
Now we can finally insert the Expansion into our table
*/
insert Grammar values (@GrammarDescript,@Element,@Expansion)
end
end
end
go

/*----------------------------------------------------------------------------------*/

if object_id('GenerateSentence','P') is not null drop procedure GenerateSentence
go
create procedure GenerateSentence
@GrammarDescript
nvarchar(40)
,@Quantity int = 1
,@Init varchar(40) = '<start>'
,@Debug bit = 0
as
begin

set nocount on

declare @Counter int
,@Sentence varchar(max)
,@Element varchar(500)
,@Expansion varchar(500)
,@Fragment varchar(500)
,@FragText varchar(500)
,@FirstChar char(1)
,@i int
,@p1 int
,@p2 int
,@p3 int

declare @SentenceBucket table (Sentence varchar(max))

set @Counter=0
while @Counter<case when @Quantity>100 then 100 else @Quantity end
begin

set @Counter=@Counter+1

/*
Infinitely loop in creating sentences until a valid one comes along
*/
while 1=1
begin

/*
Initialize
*/
set @Sentence=@Init
if @Debug=1 print ltrim(@Sentence)

/*
Perform substitutions of all Elements until done
*/
while 1=1
begin
set @Element=dbo.StrExtract(@Sentence,'<','>',1,4)
if @Element='' break /* We're done! */

/*
Get random value for the Element
*/
select top 1 @Expansion=Expansion
from Grammar
where Descript=@GrammarDescript and Element=@Element
order by newid()

/*
And put it into the sentence
*/
set @Sentence=stuff(@Sentence
,charindex(@Element,@Sentence)
,len(@Element)
,@Expansion)

if @Debug=1 print ltrim(@Sentence)
end

/*
If the sentence is valid then we're done!
Note: Sentences are valid 99.9% of the time
*/
if charindex('***',@Sentence)=0 break

/*
Otherwise, loop around again
*/
if @Debug=1 print 'Invalid sentence... Restarting...'
end

/*
Temporarily surround punctuation with spaces
*/
set @Sentence=replace(@Sentence,'"',' " ')
set @Sentence=replace(@Sentence,',',' , ')
set @Sentence=replace(@Sentence,';',' ; ')
set @Sentence=replace(@Sentence,'.',' . ')
set @Sentence=replace(@Sentence,'-',' - ')

/*
Handle plurals, verb tenses, and adjectives/adverbs
Examples: [man|men] --> man
[man|men]s --> men
[sit|sat] --> sit
[sit|sat]ed --> sat
[happy|happily] --> happy
[happy|happily]ly --> happily
*/
while 1=1
begin
set @p1=charindex('[',@Sentence)
if @p1=0 break /* We're done! */
set @p2=charindex(']',@Sentence,@p1)
set @p3=charindex(' ',@Sentence+' ',@p2)
set @Fragment=substring(@Sentence,@p1,@p3-@p1)
set @FragText=case
when @p3-@p2=1
then dbo.StrExtract(@Fragment,'[','|',1,0)
else dbo.StrExtract(@Fragment,'|',']',1,0)
end
set @Sentence=stuff(@Sentence
,charindex(@Fragment,@Sentence)
,len(@Fragment)
,@FragText+' ')
if @Debug=1 print ltrim(@Sentence)
end

/*
Handle PROPERIZE(: :) directive
*/
set @Fragment=dbo.StrExtract(@Sentence,'PROPERIZE(:',':)',1,4)
if @Fragment<>''
begin
set @FragText=substring(@Fragment,12,len(@Fragment)-13)
set @Sentence=stuff(@Sentence
,charindex(@Fragment,@Sentence)
,len(@Fragment)
,dbo.Properize(@FragText,0))
if @Debug=1 print ltrim(@Sentence)
end

/*
Handle "a" and "an" (i.e. "a elephant" -> "an elephant")
Note: Add a space at the beginning of the sentence in case
there's a potential word "A" at the beginning
*/
set @Sentence=' '+@Sentence
set @i=0
while 1=1
begin
set @i=@i+1
set @p1=dbo.At(' a ',@Sentence,@i)
if @p1=0 break
set @FirstChar=left(ltrim(substring(@Sentence,@p1+2,len(@Sentence))),1)
if @FirstChar in ('a','e','i','o','u')
set @Sentence=stuff(@Sentence,@p1+1,2,'an ')
if @Debug=1 print ltrim(@Sentence)
end

/*
Get rid of surrounding space around double quotes
in preparation for next step
*/
set @Sentence=replace(@Sentence,' " ','"')

/*
Get rid of all leading/trailing spaces
And capitalize the first letter of the sentence
*/
set @Sentence=ltrim(rtrim(@Sentence))
set @Sentence=upper(left(@Sentence,1))+substring(@Sentence,2,len(@Sentence))

/*
Capitalize any word that comes after the first
of a pair of double quotes
*/
set @i=-1
while 1=1
begin
set @i=@i+2 /* Every other double quote */
set @p1=dbo.At('"',@Sentence,@i)
if @p1=0 break
set @FragText=ltrim(substring(@Sentence,@p1+1,len(@Sentence)))
set @Sentence=left(@Sentence,@p1)
+upper(left(@FragText,1))
+substring(@FragText,2,len(@FragText))
end

/*
Fix all the close quotes
Get rid of any double-spaces
And clean up any spaces before punctuation
*/
while charindex(', "',@Sentence)>0
set @Sentence=replace(@Sentence,', "',',"')
while charindex('. "',@Sentence)>0
set @Sentence=replace(@Sentence,'. "','."')
while charindex(' ',@Sentence)>0
set @Sentence=replace(@Sentence,' ',' ')
while charindex(' ,',@Sentence)>0
set @Sentence=replace(@Sentence,' ,',',')
while charindex(' ;',@Sentence)>0
set @Sentence=replace(@Sentence,' ;',';')
while charindex(' .',@Sentence)>0
set @Sentence=replace(@Sentence,' .','.')
while charindex(' - ',@Sentence)>0
set @Sentence=replace(@Sentence,' - ','-')
while charindex(' ".',@Sentence)>0
set @Sentence=replace(@Sentence,' ".','".')

/*
Just in case, again get rid of leading/trailing spaces
*/
set @Sentence=ltrim(rtrim(@Sentence))

/*
And save it in our temporary file
*/
insert @SentenceBucket values (@Sentence)

end

/*
Send back our sentence(s)
*/
select Sentence from @SentenceBucket

end
go

Tuesday, January 26, 2010

Handy String Functions

Strings!Visual FoxPro (VFP) is very good at handling strings, and there are many useful string functions built into its language. In making the move to T-SQL, I have often wished some of those handy string functions were available.

But since they are not, I went ahead and created User-Defined Functions (UDFs) to emulate those VFP string functions. Perhaps you will find them to be useful also.

In this blog entry, I’ll introduce these UDFs (providing links to the VFP Books Online documentation that they're based upon) and give some examples of how to use them. The actual code to create these UDFs will be at the end of this article.

Stay tuned for my next blog entry, in which I’ll put together a zany project that makes use of some of these UDFs.

In order for many of these functions to do their work, they need the actual true length of a string. The LEN() function in T-SQL has the unfortunate feature of excluding trailing blanks in calculating string length. So the first function that I created is what I call a TRUELEN() function, which returns the full length of a string regardless of any trailing blanks.

select TrueLength=dbo.TrueLen('abcde     ')
/*
TrueLength
-----------
10
*/
The AT() Function is just like T-SQL’s CHARINDEX() function in that it returns the position of a search string within another string expression. The difference is that its third argument indicates the occurrence rather than a start position. Here are a couple of examples:

select FirstLetterE=dbo.At('e','Here is a sentence',1)
,ThirdLetterE=dbo.At('e','Here is a sentence',3)
,FifthLetterE=dbo.At('e','Here is a sentence',5)
/*
FirstLetterE ThirdLetterE FifthLetterE
------------ ------------ ------------
2 12 18
*/

select SecondTheOccurrence=dbo.At('the','The end of the story',2)
/*
SecondTheOccurrence
-------------------
12
*/
The RAT() Function is just like AT(), except that it finds the position of the rightmost occurrence of a search string within a string expression. This can come in handy, for example, if you have a fully-qualified filename containing lots of backslashes and you want to find the position at which the true filename starts:

select LastLetterE=dbo.RAt('e','Here is a sentence',1)
,SecondToLastLetterE=dbo.RAt('e','Here is a sentence',2)
,FifthToLastLetterE=dbo.RAt('e','Here is a sentence',5)
/*
LastLetterE SecondToLastLetterE FifthToLastLetterE
----------- ------------------- ------------------
18 15 2
*/

declare @FullyQualifiedName varchar(50)
set @FullyQualifiedName='C:\Windows\System32\Config\System.Log'
select PathNameOnly=left(@FullyQualifiedName
,dbo.RAt('\',@FullyQualifiedName,1))
,FileNameOnly=substring(@FullyQualifiedName
,dbo.RAt('\',@FullyQualifiedName,1)+1
,len(@FullyQualifiedName))
/*
PathNameOnly FileNameOnly
--------------------------- ------------
C:\Windows\System32\Config\ System.Log
*/
The OCCURS() Function will tell you the number of times a string expression occurs in another string expression:

select LetterEOccurrences=dbo.Occurs('e','Here is a sentence')
,WordTheOccurrences=dbo.Occurs('the','The end of the story')
/*
LetterEOccurrences WordTheOccurrences
------------------ ------------------
5 2
*/
The CHRTRAN() Function is kind of similar to T-SQL’s REPLACE() function, except it will do multiple individual character translations. Its first argument is a string that you want to perform the translations upon. The second argument is a string of characters that you want to individually translate. And the third argument is a string of characters that are the replacements/translations of the characters in the second argument. Here are some examples to illustrate:

select Translate1=dbo.ChrTran('abcdefghi','aei','XYZ')
,Translate2=dbo.ChrTran('123456789','316','***')
/*
Translate1 Translate2
---------- ----------
XbcdYfghZ *2*45*789
*/
The length of the third argument does not have to match the length of the second argument. Note what happens when I pass an empty string as the third argument… the characters in the second argument are removed (or just replaced with a zero-length character):

select VowelsRemoved=dbo.ChrTran('abcdefghi','aei','')
/*
VowelsRemoved
-------------
bcdfgh
*/
The above illustrates a handy way to remove multiple characters from a string. But here’s a clever way to use CHRTRAN() to remove all the characters from a string except certain ones. This example removes all non-numeric characters from the string:

declare @PhoneNumber varchar(30)
set @PhoneNumber='(650) 555-1212'
select NumericDigitsOnly=dbo.ChrTran(@PhoneNumber
,dbo.ChrTran(@PhoneNumber,'0123456789','')
,'')
/*
NumericDigitsOnly
-----------------
6505551212
*/
Note that the inner CHRTRAN() removed all the numeric characters from the string, and then the outer CHRTRAN() made use of that result to remove those characters from the original string. Cute, huh?

The STREXTRACT() Function extracts data from a string between two delimiters. I believe this was originally introduced to the VFP language to aid in shredding XML. The first argument is the string to search. The second and third arguments are the beginning and ending delimiters. The fourth argument specifies at which occurrence of the beginning delimiter you want to start the extraction.

And finally a fifth argument is a flag in which you can specify additive options… a 1 indicates a case-insensitive search (I do not make use of this value in my UDF definition and will let the collation of the database determine whether the search is case-insensitive or not), a 2 indicates that the end delimiter is not required to be found in order to do the extraction, and a 4 indicates that you wish to include the delimiters in the returned expression. Note that I said that these options are additive… for example, you can pass the value 6, which is the sum of 2+4, so both of those options will be honored.

Here are some examples to illustrate:

declare @XMLString varchar(max)
set @XMLString='
<stooge><id>1</id><name>Moe</name></stooge>
<stooge><id>2</id><name>Larry</name></stooge>
<stooge><id>3</id><name>Curly</name></stooge>'
select SecondStooge=dbo.StrExtract(@XMLString,'<name>','</name>',2,0)
,SecondStoogeWithDelimiters=dbo.StrExtract(@XMLString,'<name>','</name>',2,4)
/*
SecondStooge SecondStoogeWithDelimiters
------------ --------------------------
Larry <name>Larry</name>
*/


declare @WordList varchar(max)
set @WordList='one;two;three;four;five'
select FourthWord=dbo.StrExtract(';'+@WordList,';',';',4,0)
,FifthWord=dbo.StrExtract(';'+@WordList,';',';',5,0) /* Oops: No End Delimiter */
,FifthWordEndDelimNotReqd=dbo.StrExtract(';'+@WordList,';',';',5,2)
,FifthWordEndDelimNotReqdIncludeDelims=dbo.StrExtract(';'+@WordList,';',';',5,2+4)
/*
FourthWord FifthWord FifthWordEndDelimNotReqd FifthWordEndDelimNotReqdIncludeDelims
---------- --------- ------------------------ -------------------------------------
four five ,five
*/
VFP offers a PROPER() Function, which will capitalize a string “as appropriate” for proper names. So, for example, it will take the string BRAD SCHULZ and will return Brad Schulz. That’s fine, but the unfortunate thing is that PROPER() is very limiting… it will set the whole string to lower case and will capitalize any letters that follow a space (or are at the beginning of the string)… and that’s it. It will not handle letters that come after hyphens or parentheses or quotation marks or any other special characters correctly. So, if you pass it ITZIK BEN-GAN (T-SQL “GURU”), it would end up returning Itzik Ben-gan (t-sql “guru”)… only two letters would get capitalized… the ‘I’ and the ‘B’.

So I came up with a function that I call PROPERIZE(). It will correctly capitalize any letters that follow various special characters (like hyphen, ampersand, parenthesis, etc). It also takes a second argument… if it is equal to 1, then the string will be converted to lower case first before the capitalization takes place… if it is equal to 0, then the string is capitalized “as is”. The following examples illustrate this:

select Example1=dbo.Properize('itzik ben-gan (T-SQL "guru")',1)
,Example2=dbo.Properize('itzik ben-gan (T-SQL "guru")',0)
/*
Example1 Example2
---------------------------- ----------------------------
Itzik Ben-Gan (T-Sql "Guru") Itzik Ben-Gan (T-SQL "Guru")
*/

select Example3=dbo.Properize('ITZIK BEN-GAN (T-SQL "GURU")',1)
,Example4=dbo.Properize('ITZIK BEN-GAN (T-SQL "GURU")',0)
/*
Example3 Example4
---------------------------- ----------------------------
Itzik Ben-Gan (T-Sql "Guru") ITZIK BEN-GAN (T-SQL "GURU")
*/
Finally, we have the PADL() and PADR() and PADC() Functions, which pad a string to a specified length with a specific character on the left or right sides, or both:

select PadLeft=dbo.PadL('Title',15,'*')
,PadRight=dbo.PadR('Title',15,'*')
,PadCenter=dbo.PadC('Title',15,'*')
/*
PadLeft PadRight PadCenter
--------------- --------------- ---------------
**********Title Title********** *****Title*****
*/
The code to create all these functions is below. Many of these functions are dependent upon each other, so you should run the code to create them all at once. Instead of doing a copy/paste of the code below, you can go here to download the code.

Just a reminder… Be sure to tune in again next time, when I put together a wacky project that makes use of these string functions. Until then…

use TempDB  /* Change to whatever database you wish */
go

/*----------------------------------------------------------------------------*/

if object_id('TrueLen') is not null drop function TrueLen
go
create function TrueLen
(
@Expr nvarchar(max)
)
returns int
as

begin
declare @TrueLen int
set @TrueLen=len(@Expr+'*')-1 /* or datalength(@Expr)/2 */
return @TrueLen
end
go

/*----------------------------------------------------------------------------*/

if object_id('At') is not null drop function At
go
create function At
(
@Expr1 nvarchar(max)
,@Expr2 nvarchar(max)
,@Occurrence int = 1
)
returns int
as
begin
declare @Position int
if @Expr1 is not null and
@Expr2
is not null and
@Occurrence
is not null
begin
declare @Counter int
select @Position=0, @Counter=0
while @Counter<@Occurrence
begin
set @Counter=@Counter+1
set @Position=charindex(@Expr1,@Expr2,@Position+1)
if @Position=0 or @Position is null break
end
end
return @Position
end
go

/*----------------------------------------------------------------------------*/

if object_id('RAt') is not null drop function RAt
go
create function RAt
(
@Expr1 nvarchar(max)
,@Expr2 nvarchar(max)
,@Occurrence int = 1
)
returns int
as
begin
declare @Position int
if @Expr1 is not null and
@Expr2
is not null and
@Occurrence
is not null
begin
declare @Expr1Len int, @Expr2Len int, @AtPos int
select @Expr1Len=dbo.TrueLen(@Expr1)
,@Expr2Len=dbo.TrueLen(@Expr2)
,@AtPos=dbo.At(reverse(@Expr1),reverse(@Expr2),@Occurrence)
set @Position=@Expr2Len-(@Expr1Len+@AtPos-1)+1
end
return @Position
end
go

/*----------------------------------------------------------------------------*/

if object_id('Occurs') is not null drop function Occurs
go
create function Occurs
(
@Expr1 nvarchar(max)
,@Expr2 nvarchar(max)
)
returns int
as
begin
declare @Result int
if @Expr1 is not null and
@Expr2
is not null
begin
declare @Expr1Len int
,@Expr2Len int
,@NewExpr2 nvarchar(max)
,@NewExpr2Len int
select @Expr1Len=dbo.TrueLen(@Expr1)
,@Expr2Len=dbo.TrueLen(@Expr2)
,@NewExpr2=replace(@Expr2,@Expr1,'')
set @NewExpr2Len=dbo.TrueLen(@NewExpr2)
set @Result=case
when @Expr1Len=0
then 0
else (@Expr2Len-@NewExpr2Len)/@Expr1Len
end

end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('ChrTran') is not null drop function ChrTran
go
create function ChrTran
(
@Expr nvarchar(max)
,@SearchChars nvarchar(max)
,@ReplaceChars nvarchar(max)
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null and
@SearchChars
is not null and
@ReplaceChars
is not null
begin
declare @Counter int
select @Result=@Expr, @Counter=0
while @Counter<dbo.TrueLen(@SearchChars)
begin
set @Counter=@Counter+1
set @Result=replace(@Result
,substring(@SearchChars,@Counter,1)
,substring(@ReplaceChars,@Counter,1))
end
end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('StrExtract') is not null drop function StrExtract
go
create function StrExtract
(
@Expr nvarchar(max)
,@BeginDelim nvarchar(max)
,@EndDelim nvarchar(max)
,@Occurrence int
,@Flags int /* 2=EndDelim not required, 4=Include Delims in result */
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null and
@BeginDelim is not null and
@EndDelim
is not null and
@Occurrence
is not null and
@Flags is not null
begin
declare @Exprlen int
,@BeginDelimLen int
,@EndDelimLen int
,@BeginDelimPos int
,@EndDelimPos int
select @Result=''
,@Exprlen=dbo.TrueLen(@Expr)
,@BeginDelimLen=dbo.TrueLen(@BeginDelim)
,@EndDelimLen=dbo.TrueLen(@EndDelim)
,@BeginDelimPos=dbo.At(@BeginDelim
,@Expr
,@Occurrence)
if @BeginDelimPos>0
begin
set @EndDelimPos=charindex(@EndDelim
,@Expr
,@BeginDelimPos+@BeginDelimLen)
if @EndDelimPos=0 and @Flags&2=2
set @EndDelimPos=@Exprlen+1
if @EndDelimPos>0
set @Result=case
when @Flags&4=4 /* Include Delimiters in Result */
then substring(@Expr
,@BeginDelimPos
,@EndDelimPos-@BeginDelimPos+@EndDelimLen)
else substring(@Expr
,@BeginDelimPos+@BeginDelimLen
,@EndDelimPos-@BeginDelimPos-@BeginDelimLen)
end
end
end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('Properize') is not null drop function Properize
go
create function Properize
(
@Expr nvarchar(max)
,@SetToLowerCaseFirst bit = 0
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null
begin
declare @Position int
,@Capitalize bit
,@Char nchar(1)
select @Result=case
when @SetToLowerCaseFirst=1
then lower(@Expr)
else @Expr
end
,@Position=0
,@Capitalize=1
while @Position<len(@Result)
begin
select @Position=@Position+1
,@Char=substring(@Result,@Position,1)
if @Capitalize=1
select @Capitalize=0
,@Result=stuff(@Result
,@Position
,1
,upper(@Char))
if charindex(@Char,' #%&*()-_=+[]{}":./')>0 set @Capitalize=1
end
end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('PadL') is not null drop function PadL
go
create function PadL
(
@Expr nvarchar(max)
,@Size int
,@PadChar char(1)
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null and
@Size is not null and
@PadChar is not null
begin
set @Result=right(replicate(@PadChar,@Size)+@Expr,@Size)
end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('PadR') is not null drop function PadR
go
create function PadR
(
@Expr nvarchar(max)
,@Size int
,@PadChar char(1)
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null and
@Size is not null and
@PadChar is not null
begin
set @Result=left(@Expr+replicate(@PadChar,@Size),@Size)
end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('PadC') is not null drop function PadC
go
create function PadC
(
@Expr nvarchar(max)
,@Size int
,@PadChar char(1)
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null and
@Size is not null and
@PadChar is not null
begin
declare @Exprlen int
,@LeftSize int
set @Exprlen=dbo.TrueLen(@Expr)
set @LeftSize=case when @Size<@Exprlen then 0 else (@Size-@Exprlen)/2 end
set @Result=replicate(@PadChar,@LeftSize)+dbo.PadR(@Expr,@Size-@LeftSize,@PadChar)
end
return @Result
end
go

Tuesday, January 12, 2010

The Troll's Puzzle: A SQL Fable

This blog entry is participating in the second T-SQL Tuesday, hosted once again by Adam Machanic. You are invited to visit his blog to join the party and read more blogs participating in this month’s theme: Puzzling Situations.

Now please sit back and enjoy our story…



The SQL Troll's CastleOnce upon a time, in a faraway land, there lived three T-SQL programmers.

Frederick Function just loved to call T-SQL functions in his queries. He thought it made his code look more complicated and that made him feel good. He especially liked working with dates because his code would be sprinkled with lots and lots of DATEADDs and DATEDIFFs.

Osric Ordinal liked to take shortcuts in his queries any time he had the opportunity.

And Colin Columnname carefully named his columns in his SELECT list and used those column names throughout the query where possible.

These three young lads were on a quest to save a fair maiden who was held prisoner in a castle in a deep dark forest. This castle was guarded by an evil ugly troll. Legend had it that the troll had three seemingly simple yet deceptive T-SQL puzzles… If any brave soul was able to answer all three T-SQL puzzles successfully, he could enter the castle and save the fair maiden. However, if unsuccessful in solving any of the T-SQL puzzles, he would be eaten by the troll.

Our three young heroes were brave and confident men, and so they approached the troll.

“Whaddya want?” asked the troll of the three men, spitting green slime as he spoke.

“We have come to solve your T-SQL challenges and save the fair maiden,” answered each of the men.

“You’re all fools!” snarled the troll. “You are delving into territory where hundreds have failed.”

None of the men said anything… they were steadfast in their intentions.

“Hah! We shall proceed then. You all arrived just in time… I was getting a bit hungry. Heh-heh-heh.” The troll's stomach growled and his eyes narrowed and earwax dripped out of his right ear. “Take out your laptops, and prepare for the first challenge!” he bellowed.

The men were ready at their keyboards.

“Challenge Number One: Write a query that pulls out the first and last initial of all the names in the Contacts table in AdventureWorks. Name those columns FirstInitial and LastInitial. Finally, sort the result in order of the first and last initial. You have 60 seconds. Go!”

For a few seconds, the three brave men were in shock, because they couldn’t believe how easy this first challenge was, but after a short while, they started typing furiously at their keyboards. They all looked up long before the 60 seconds were over.

“Let’s see what you all have,” said the troll.

Frederick Function demonstrated his query. He used the same LEFT() functions in the ORDER BY clause that he had used in the SELECT list… he thought it looked really cool to have the functions repeated like that:

select FirstInitial=left(FirstName,1)
,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by left(FirstName,1)
,left(LastName,1)
/*
FirstInitial LastInitial
------------ -----------
A A
A A
A A
A A
...
Z W
Z W
Z Y
Z Z
(19972 rows)
*/
Osric Ordinal’s query took advantage of using ordinal numbers in the ORDER BY to represent column positions in the SELECT list. He loved taking this kind of shortcut:

select FirstInitial=left(FirstName,1)
,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by 1,2
/*
FirstInitial LastInitial
------------ -----------
A A
A A
A A
A A
...
Z W
Z W
Z Y
Z Z
(19972 rows)
*/
Colin Columnname’s query used the names of the columns that were established in the SELECT list in his ORDER BY clause:

select FirstInitial=left(FirstName,1)
,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by FirstInitial
,LastInitial
/*
FirstInitial LastInitial
------------ -----------
A A
A A
A A
A A
...
Z W
Z W
Z Y
Z Z
(19972 rows)
*/
“You have all successfully passed the first challenge,” growled the troll. “But that’s no great accomplishment… Most people do.”

The troll’s stomach rumbled again, this time so loud that the ground shook. A pimple on his left cheek popped spontaneously and violently, its putrid contents spewing forth and missing Colin by mere centimeters.

“Challenge Number Two: Wrap the query of your first challenge inside a stored procedure called #GetContactInitials which accepts a single string parameter called @SortOrder. If @SortOrder is equal to ‘First’ then sort by the first initial and last initial. If @SortOrder is equal to ‘Last’ then sort by last initial and first initial. No IF statements are allowed… you must have only a single query in the stored procedure. You have 60 seconds. Go!”

The men got to work.

Frederick Function finished very quickly, and came up with the following stored procedure, which compiled without error. Again, he was even more proud of his procedure because it had even more function calls in it:

if object_id('tempdb..[#GetContactInitials]','P') is not null 
drop procedure #GetContactInitials
go
create procedure #GetContactInitials
@SortOrder
varchar(20)
as
select
FirstInitial=left(FirstName,1)
,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by case @SortOrder
when 'First' then left(FirstName,1)
when 'Last' then left(LastName,1)
end
,case @SortOrder
when 'First' then left(LastName,1)
when 'Last' then left(FirstName,1)
end
Osric Ordinal also produced his stored procedure very quickly,using his beloved ordinal number approach in the ORDER BY. His procedure also compiled without error:

if object_id('tempdb..[#GetContactInitials]','P') is not null 
drop procedure #GetContactInitials
go
create procedure #GetContactInitials
@SortOrder
varchar(20)
as
select
FirstInitial=left(FirstName,1)
,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by case @SortOrder
when 'First' then 1
when 'Last' then 2
end
,case @SortOrder
when 'First' then 2
when 'Last' then 1
end
Colin Columnname at first had trouble. He had put his procedure together like so, but it produced compile errors:

if object_id('tempdb..[#GetContactInitials]','P') is not null 
drop procedure #GetContactInitials
go
create procedure #GetContactInitials
@SortOrder
varchar(20)
as
select
FirstInitial=left(FirstName,1)
,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by case @SortOrder
when 'First' then FirstInitial
when 'Last' then LastInitial
end
,case @SortOrder
when 'First' then LastInitial
when 'Last' then FirstInitial
end
/*
Msg 207, Level 16, State 1, Procedure #GetContactInitials, Line 9
Invalid column name 'FirstInitial'.
Msg 207, Level 16, State 1, Procedure #GetContactInitials, Line 10
Invalid column name 'LastInitial'.
Msg 207, Level 16, State 1, Procedure #GetContactInitials, Line 13
Invalid column name 'LastInitial'.
Msg 207, Level 16, State 1, Procedure #GetContactInitials, Line 14
Invalid column name 'FirstInitial'.
*/
At first this seemed confusing. Why was he getting error messages saying that ‘FirstInitial’ and ‘LastInitial’ were invalid column names? Clearly they were valid in his first challenge. And the ORDER BY is one of the last operations taking place in a query, so the column names would have already been established by the SELECT list. It was puzzling indeed.

Then, in a flash of insight, Colin understood. Because the ORDER BY clause consisted of CASE expressions, T-SQL had to evaluate those expressions for each and every row in the FROM table. In other words, during the Clustered Index Scan of the Contacts table, T-SQL would have to compute the CASE expression values. And then, only after all rows were evaluated in this manner, finally the Sort required by the ORDER BY would take place on those already-evaluated CASE expressions. So, while it was true that the actual ORDER BY Sort logically took place at the end of a query operation, the expressions within the ORDER BY list had to be evaluated at the beginning of the query operation.

So Colin quickly re-wrote his query to instead use a derived table in his FROM clause. This way he could keep his column names in the ORDER BY. He managed to type the very last character of code just as the 60 seconds were up:

if object_id('tempdb..[#GetContactInitials]','P') is not null 
drop procedure #GetContactInitials
go
create procedure #GetContactInitials
@SortOrder
varchar(20)
as
select
FirstInitial
,LastInitial
from (select FirstInitial=left(FirstName,1)
,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact) InitialValues
order by case @SortOrder
when 'First' then FirstInitial
when 'Last' then LastInitial
end
,case @SortOrder
when 'First' then LastInitial
when 'Last' then FirstInitial
end
“Now we will test out your stored procedures,” snarled the troll.

Frederick Function demonstrated his stored procedure, and it produced the correct output:

exec #GetContactInitials 'First'
/*
FirstInitial LastInitial
------------ -----------
A A
A A
A A
A A
...
Z W
Z W
Z Y
Z Z
(19972 rows)
*/

exec #GetContactInitials 'Last'
/*
FirstInitial LastInitial
------------ -----------
A A
A A
A A
A A
...
W Z
W Z
W Z
Z Z
(19972 rows)
*/
Colin Columnname demonstrated his stored procedure, and it, too, produced the same correct output.

Finally, Osric Ordinal demonstrated his stored procedure. But it produced output that was unexpected:

exec #GetContactInitials 'First'
/*
FirstInitial LastInitial
------------ -----------
G A
C A
K A
H A
P A
...
C G
I R
C H
C Z
C H
(19972 rows)
*/

exec #GetContactInitials 'Last'
/*
FirstInitial LastInitial
------------ -----------
G A
C A
K A
H A
P A
...
C G
I R
C H
C Z
C H
(19972 rows)
*/
“I don’t understand,” said Osric. “I wrapped my original query in a stored procedure, and I used a CASE exp---“

But Osric didn’t get a chance to finish his sentence, because his head had suddenly disappeared. It was being pulverized by the troll’s sharp teeth. Before the rest of Osric’s headless body could crumple to the ground, the troll scooped it up and swallowed it too.

The troll let out a loud belch. Frederick and Colin tried not to faint from the foul stench of his breath.

Colin glanced over at Osric’s laptop and saw that Osric’s query had the same CASE expression evaluation problem that Colin’s original query had. Osric had (incorrectly) figured that his use of ordinal column positions would still be honored within the CASE expressions, but instead, they were just evaluated as simple constant integers of 1 and 2, not referring to column positions anymore at all. For example, when his procedure was called with the parameter ‘First’, each row had the same CASE expression evaluations… the first CASE would evaluate to the constant integer 1 and the second would evaluate to the constant integer 2. Since every single row had the same constant integer expressions to ORDER BY, the final result set simply came out in the order it was scanned… in clustered index order.

“Challenge Number Three: This is similar to Challenge Number Two. The only difference is that the stored procedure is to be called #GetDistinctContactInitials and the result set should be the DISTINCT set of first and last initials. The same rules apply regarding the @SortOrder parameter and having only a single query in the stored procedure. You have 60 seconds. Go!”

Colin took his procedure from Challenge Number Two and renamed the procedure as instructed and simply added a DISTINCT to his derived table (and also renamed the derived table’s alias name for clarity and completeness). It compiled without error:

if object_id('tempdb..[#GetDistinctContactInitials]','P') is not null 
drop procedure #GetDistinctContactInitials
go
create procedure #GetDistinctContactInitials
@SortOrder
varchar(20)
as
select
FirstInitial
,LastInitial
from (select
DISTINCT FirstInitial=left(FirstName,1)
,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact) DistinctValues
order by case @SortOrder
when 'First'
then FirstInitial
else LastInitial
end
,case @SortOrder
when 'First'
then LastInitial
else FirstInitial
end
Frederick did the same thing… he renamed the procedure and added a DISTINCT to his query, but he got the following error when trying to compile it:

if object_id('tempdb..[#GetDistinctContactInitials]','P') is not null 
drop procedure #GetDistinctContactInitials
go
create procedure #GetDistinctContactInitials
@SortOrder
varchar(20)
as
select

DISTINCT FirstInitial=left(FirstName,1)
,LastInitial=left(LastName,1)
from AdventureWorks.Person.Contact
order by case @SortOrder
when 'First' then left(FirstName,1)
when 'Last' then left(LastName,1)
end
,case @SortOrder
when 'First' then left(LastName,1)
when 'Last' then left(FirstName,1)
end
/*
Msg 145, Level 15, State 1, Procedure #GetDistinctContactInitials, Line 5
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
*/
What the heck? All Frederick really did was add a DISTINCT to his query. Why was it giving him problems? And what did the error message mean? The ORDER BY items were clearly in the SELECT list, he thought.

Frederick wasn’t as clever as Colin in figuring out the fact that having CASE expressions in his ORDER BY clause forced T-SQL to evaluate those expressions during the Clustered Index Scan phase of the query. It had worked fine for his procedure in Challenge Number Two. But by adding the DISTINCT in Challenge Number Three, everything changed. Since adding the DISTINCT keyword shrinks the result set, one must ORDER BY only the columns in that resulting shrunken set… One cannot ORDER BY any expressions based on columns in the original FROM table any more.

Frederick started to get nervous. As the precious seconds ticked by, he could see no way to correct his query.

“Time’s up!” growled the troll, “Let’s see what you have.” He looked at Frederick’s procedure and saw that it had compile errors.

Rivulets of sweat were gushing down Frederick’s face and body like waterfalls.

“Ahhhh…” said the troll, “Thank you for adding the fine salty marinade.” And without another word, he gobbled up Frederick, swallowing him whole.

The troll turned to Colin.

“Demonstrate your stored procedure!” commanded the troll.

Colin did just that, coming up with the correct result:

exec #GetDistinctContactInitials 'First'
/*
FirstInitial LastInitial
------------ -----------
A A
A B
A C
A D
A E
...
Z S
Z T
Z W
Z Y
Z Z
(544 Rows)
*/

exec #GetDistinctContactInitials 'Last'
/*
FirstInitial LastInitial
------------ -----------
A A
B A
C A
D A
E A
...
S Z
T Z
V Z
W Z
Z Z
(544 Rows)
*/
“Finally! It’s about time!” said the troll. “I was getting sick and tired of this whole SQL puzzle gig. Congratulations… The maiden is yours. I’ll see ya later… I’m off to terrorize a village.” And he left without another word, trudging over a nearby hill.

Colin raced into the castle and took the steps up to the tower three at a time.

And there was the fair maiden. She turned to look at him, the twin silks from her wimple rippling in the soft breeze from the window.

“My word!” he exclaimed, “You’re absolutely beautiful! In fact, in that white dress and wimple, you look just Yeoman Tonia Barrows in---”

The fair maiden interrupted, completing his thought: “…in Shore Leave, the 17th episode of the original series of Star Trek.”

Colin, a certified Star Trek geek, was speechless.

“Thank you,” continued the maiden, “That’s kind of you to say. And that was a fairly good episode, don’t you think? However, I have to say that it doesn’t hold up to other first-season episodes like Devil In The Dark or The City On The Edge of Forever, does it? Still, Theodore Sturgeon, who wrote the teleplay of Shore Leave, did go on to write Amok Time, which was a classic second-season episode.”

Colin couldn’t help himself. “Wow! What a woman!” he exclaimed. He took her into his arms.

And they lived long and prospered happily ever after.

The End.

Wednesday, January 6, 2010

Delimited String Tennis (Again)… The Final Volley?

Delimited String Tennis Anyone?This is a continuation of my previous blog post, Delimited String Tennis Anyone?, which playfully talked about a virtual T-SQL Tennis match… a friendly rally back and forth between SQL Server MVP Adam Haines and myself in discussing methods of shredding comma-delimited strings in our blogs.

In my previous post, I expressed my disappointment with the following bit of code, which shreds elements from a comma-delimited string consisting of 5000 elements:

/* 
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)

select x.i.value('.','varchar(3)')
from (select XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)) a
cross apply XMLList.nodes('i') x(i)
This code, which I will call the “Derived Table Method”, took over 6 painful minutes to execute!

It had to do with how the XML column in the derived table was actually streamed and processed in small chunks as opposed to just immediately instantiating a normal, vanilla XML column.

My solution to this problem was to just create an intermediate XML variable and process it directly:

declare @XMLList xml
set
@XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)

select x.i.value('.','varchar(3)')
from @XMLList.nodes('i') x(i)
This code, which I will call the “XML Variable Method”, was more like it… it took less than a second to execute… an improvement of over 2000%!

Since I wrote that blog entry, I received a few comments and e-mails that suggested some other approaches.

A person named “Mister Magoo” (no relation to the cartoon character voiced by Jim Backus in the 1950’s and 1960’s I’m sure) commented that the “Derived Table Method” could be improved dramatically just by adding a .query() to the XML column created in the derived table:

select x.i.value('.','varchar(3)')
from (select XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml).query('.')) a
cross apply XMLList.nodes('i') x(i)
He was correct! This also took less than a second! I suppose the theory here is that adding the .query() method forces the XML column to be instantiated immediately instead of bringing about the whole streaming nonsense.

This is excellent… it eliminates the step of having to create an intermediate XML variable… everything can be done in one single statement. It’s not quite as fast as the “XML Variable Method”, but we’re only talking about a small difference in milliseconds… it certainly performs like lightning.

The results of the above 3 methods come out as follows:

/*
Method CPU Duration(ms)
----------------------------------------------------
Derived Table 369,187 398,791
Derived Table with .query() 156 168
XML Variable 99 117
*/
I also got an e-mail from my friend SQL Server MVP Alejandro Mesa (some of you may know him as Hunchback on various SQL Forums), who mentioned that things could be sped up even more by changing the XQuery expression passed as the first parameter to the .value() method.

Instead of the simple ‘.’ that I passed…

select x.i.value('.','varchar(3)')
…Alejandro suggested using the .text() accessor and an ordinal [1] predicate, like so:

select x.i.value('(./text())[1]','varchar(3)')
True enough, this did improve the performance of the query. With this XQuery expression, we are indicating that we want the first instance ([1]) of the text-only node values (i.e. not any XML sub-elements) in our x.i node. This is more specific than our original ‘.’ expression, which indicated that we wanted to process the entire x.i node.

You can get a clearer idea of how these two XQuery expressions work in the following examples:

declare @XMLVar xml
set
@XMLVar='<a>abc<b>xyz</b>123</a>'

select x.i.value('.','varchar(10)')
from @XMLVar.nodes('a') x(i)
/* Returns 'abcxyz123' */

select x.i.value('(./text())[1]','varchar(10)')
from @XMLVar.nodes('a') x(i)
/* Returns 'abc' */

select x.i.value('(./text())[2]','varchar(10)')
from @XMLVar.nodes('a') x(i)
/* Returns '123' */
The @XMLVar variable above had two text nodes within the <a></a> element. In our original queries at the beginning of this article, each of the <i></i> elements contained only a single text node (and that’s why the expressions of ‘.’ and ‘(./text())[1]’ both produced the same result), and yet the expression ‘(./text())[1]’ was much more efficient and therefore faster because it was so much more specific in indicating what we wanted to process.

When I applied Alejandro’s suggestion to the queries, you can see how the CPU and Duration decreased even more:

/*
Method CPU Duration(ms)
----------------------------------------------------------------
Derived Table 369,187 398,791
Derived Table with .query() 156 168
Derived Table with .query() and text()[1] 130 159
XML Variable 99 117
XML Variable with text()[1] 76 77
*/
Alejandro also pointed me towards a very detailed technical article at Microsoft TechNet entitled Performance Optimizations for the XML Data Type in SQL Server 2005. If you do a lot of XML manipulation, this article looks like a must-read.

Thank you again to Mister Magoo and Alejandro for their contributions! With all of this knowledge, we can all now be the speed demons of delimited string shredding.

Wednesday, December 23, 2009

It's a Small World After All

Small WorldI don't have any heavy in-depth T-SQL code or analysis today…

Instead, let’s just sit back and take a quick break from the hectic holiday season…

Sip some hot cocoa or cider…

And have a little chuckle over a couple “Small World” incidents…

The late 1990’s were kind of a busy whirlwind blur for me… I worked at home with 3 young children running around; I was doing a lot of preventative Y2K work; I was a Visual FoxPro MVP frequenting the old CompuServe FoxForum and writing articles for a couple of VFP publications; and I was doing lots of short-term trading in the stock market. (Ahhh… those were the days… check out the 1100% advance in InfoSpace in less than 5 months during 1999/2000).

INSP Oct1999-Mar2000

Yes, it was a busy time, and that’s probably why I can’t remember writing a book on SQL Server 7 back in 1999 with none other than Buck Woody (known then, more formally, as Greg Woody):

SQL Server 7.0 Book

I stumbled upon this book the other day at Amazon quite by mistake.

Hmmm… I guess I’d better update my résumé.

It’s amazing the kind of wild stuff you can find out about yourself on the web. For example, about a year ago, SQL Server MVP Kalen Delaney was shocked to discover that she had somehow written a book on Oracle… Kalen may be interested to discover that this book is now available at Amazon in Spanish!

One other funny Google story…

Several years ago, I decided (for whatever reason) to Google my young son’s name. What I found came as quite a surprise. I found a guy named Qxbr Schulz (name disguised) who was a self-employed software developer who had a son named Brad. Well, I was a guy named Brad Schulz who was a self-employed software developer who had a son named Qxbr. And the really weird thing is that we lived only 3 miles from each other!

Yes, as the song goes: ”It’s a small world after all!”

Anyway, he and I met soon thereafter at a Fourth-of-July block party. And no, we’re not related in any way. But he and I have a little chuckle every time we run into each other in town.

Best wishes for the holidays, everybody…

Saturday, December 12, 2009

Delimited String Tennis Anyone?

Tennis Anyone?SQL Server MVP Adam Haines and I have been going back and forth for months in our blogs on the subject of creating and splitting a delimited string… it’s been kind of a virtual T-SQL Tennis match… not a competition, but rather a friendly rally, knocking the subject back and forth, practicing our skills. Between the two of us, we’ve just about beaten the subject to death; however, I like to think that our collective research has been a win-win situation for the SQL Server community.

(If you’d like to look at these past articles, read Adam’s posts here, here, here, and here and read my posts and comments here, here, here, and here.)

In Adam’s latest volley a few days ago (which you can read here), he did some in-depth time tests on various methods of splitting a delimited string…. the XML approach using XPath and the .nodes() and .value() methods, the permanent Numbers table approach, and the approach using a TVF with a virtual table of Numbers.

His verdict was that the XML approach was a biiiiig disappointment when handling a comma-delimited string of many items, in one case performing over 200 times slower (!!) than the Numbers table approaches. Like Adam, I was a bit crestfallen. The XML method is the usual method suggested on the T-SQL forums for doing this sort of thing.

It is literally shocking how badly it performs.

Take a look at the following example, in which I create a variable that contains a comma-delimited list of 5000 items. Then it uses the XML approach to shred the list into a result set of 5000 rows:

/* 
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)

select x.i.value('.','varchar(3)')
from (select XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)) a
cross apply XMLList.nodes('i') x(i)
The task that we’re asking SQL Server to do is not that complicated, but do you want to guess how long this takes to fully execute on my system? Are you sitting down?

It takes over 6 minutes!

Really disappointing…

I had a lot of trouble accepting this. After doing several experiments, I finally figured out what was going on.

The problem lies in the derived table in the FROM clause, where we create a derived XML column via the CAST function.

It seems we are not really creating a complete column or value at all. Apparently we are inadvertently creating an XML stream, which sends small chunks of data to the .nodes() method and then the chunks get handed over to the .value() method to come up with our result… one chunk at a time.

On my system, when I execute the above code, I see nothing at all happen for about almost 50 seconds. Then the first 676 rows get spit out into the results window. After another 50 seconds or so, another similar-sized chunk of rows get spit out. This happens like clockwork until it spits out the final 5000th row over 6 minutes after I started the query.

You may recall in a previous post of mine where I said that an XML stream is processed in 2033-byte chunks. Well, 676 rows of 3 bytes (‘ABC’) each equals 2028 bytes. (I’m assuming that there’s a little tiny bit of overhead involved and that’s why we can’t squeeze in the 677th row). My comma-delimited string consists of 5000 items at 3 bytes each, coming to 15000 bytes. Divide that by 2033 bytes and you get 7.38, which is the number of chunks of data that get processed. Multiply that by 50 seconds per chunk, and you get… voila… 6.15 minutes.

Well, this must be all really exciting for you math nerds out there, but how does it help us?

It does help to explain what’s going on, and it does show us that streaming the XML is reeeeeaaalllly sloooooow.

So is there hope for the XML method?

YES! YES! YES!

This slow streaming only occurs when we are creating the XML to be processed via a derived table or CTE. What If we instead transfer our comma-delimited string into an actual XML variable and then process that variable, like so?:

/* 
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)

declare @XMLList xml
set
@XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)

select x.i.value('.','varchar(3)')
from @XMLList.nodes('i') x(i)
Believe it or not, this takes only 0.174 seconds!

In comparing the two pieces of code outlined in this article, here are the results:

/*            Derived     XML
Table Variable Percent
Measurement Method Method Improvement
--------------------------------------------
CPU 386,172 125 2989%
Duration(ms) 368,747 174 2019%
*/
So I’m happy to report that all is not lost for the XML method. It’s still a contender in the list-splitting competition, giving the Numbers table methods a run for their money… as long as it is implemented in a certain way.

Back to you, Adam…

UPDATE ON Jan06,2010: Please read the follow-up to this article in which I talk about some additional methods to speed the process up even further.