Tuesday, September 29, 2009

Know Your Data!

Back in July, I wrote a blog entry that highlighted the CROSS APPLY operator. A problem was presented that required us to pull word pairs out of a column called Message in a temporary table @T and find out how many times the word pairs occurred in the table.

Two different approaches were used to solve the problem.

The first approach involved creating a multi-line function that accepted a string and returned a table of word pairs. The final query that made use of this function looked like this (please refer back to the July blog entry for the definition of the ufn_GetWordPairs function):

select WordPair,Occurrences=count(*)
from @t
cross apply dbo.ufn_GetWordPairs(Message) f
group by WordPair
order by count(*) desc
It was a nice and simple easy-to-read query, but it took over 25 seconds to process a table with 100,000 rows.

The second approach to the problem was using a Numbers table. The final query looked like this:

select WordPair,Occurrences=count(*)
from @t
cross apply (select TrimMessage=ltrim(rtrim(Message))) f1
join Numbers on substring(' '+TrimMessage,N,1)=' '
and substring(' '+TrimMessage,N+1,1)<>' '
and N<len(TrimMessage)+1
cross apply (select WorkString=substring(TrimMessage+' ',N,len(TrimMessage)+1)) f2
cross apply (select p1=charindex(' ',WorkString)) f3
cross apply (select Word1=left(WorkString,p1-1)) f4
cross apply (select RestOfString=ltrim(substring(WorkString,p1,len(WorkString)))) f5
cross apply (select p2=charindex(' ',RestOfString)) f6
cross apply (select Word2=case when p2>0 then left(RestOfString,p2-1) end) f7
cross apply (select WordPair=Word1+' '+Word2) f8
where WordPair is not null
group by WordPair
order by count(*) desc
It looks very intimidating, but the blog entry went through the development of the query step-by-step. This query took about 12 seconds to process the same 100,000 rows… an improvement of over 50%.

So it seemed like the Numbers table approach was the clear winner.

Well… yes and no.

In the tests that I had done in July, I had 100,000 rows, but each Message column value in the @T table had only about 5-6 words in it.

Recently, I decided to do a series of tests to see if the number of words made a difference, and it certainly did!

First I changed the udf_GetWordPairs function so that it accepted and acted upon a VARCHAR(MAX). Then I created a test table of 10,000 rows. That remained consistent. What I changed was the number of words in the Message column, and you can see my results below:

/*                   (ms) 
#Words (ms) Numbers (%) Numbers
Per Function Table Function Function Table
Per Row Approach Approach /Numbers ms/Word ms/Word
----------------------------------------------------------
25 9241 6495 142% 369.6 259.8
50 17529 14744 119% 350.6 294.9
75 26590 27447 97% 354.5 366.0
100 35225 39434 89% 352.3 394.3
150 53139 69016 77% 354.3 460.1
200 71435 101267 71% 357.2 506.3
250 87203 141602 62% 348.8 566.4
300 101267 184704 55% 337.6 615.7
400 142282 291286 49% 355.7 728.2
500 173923 415866 42% 347.8 831.7
*/
Here is a graph that shows the differences in times (i.e. the first 3 columns in the data table above):

Word Pair Query Durations

The Numbers Table approach performs better until the number of words per row is in the neighborhood of about 75. After that, it’s no contest… the Function approach wins by more and more of a margin as you increase the number of words.

If you look at the raw data, you can see that the Function approach takes about 350 milliseconds per word, and that remains constant. However, the Numbers Table approach takes longer and longer as the words increase… from 260 ms/word when processing 25 words/row up to 830 ms/word when processing 500 words/row. This exponential increase in time as the number of words increases is quite evident in the graph.

The lesson here is that you shouldn’t just blindly take an approach in solving a problem without intimately knowing the data that you’re eventually going to act upon.

Friday, September 25, 2009

Down For The COUNT(*)

I was playing around with the COUNT() aggregate the other day and found some very interesting behaviors.

As you may know, COUNT(*) will count all rows; whereas, COUNT(Expression) will count all rows in which Expression is not NULL.

Consider the following 3 queries:

select count(*) from northwind.dbo.customers
/* Returns 91 */

select count(1) from northwind.dbo.customers
/* Returns 91 */

select count('junk') from northwind.dbo.customers
/* Returns 91 */
Not only do they all return the same result, but they also generate the same query plan. In fact, if you look under the hood at the (XML) query plan, they all actually just do a COUNT(*):

COUNT(*)

In the case of COUNT(1) and COUNT(‘junk’), the Optimizer is “smart enough” to recognize that a constant is between the parentheses, and so it translates those into COUNT(*).

If you use deterministic functions that involve only constants (and no variables or columns from the source tables), then they will also get translated into a COUNT(*). All of the examples below generate the exact same query plan as above; in other words, one with AggType=”countstar”:

select count(left('abc',1)) from northwind.dbo.customers
/* Returns 91 */

select count(char(32)) from northwind.dbo.customers
/* Returns 91 */

select count(case when 1=1 then 1 else 0 end) from northwind.dbo.customers
/* Returns 91 */

select count(pi()*pi()) from northwind.dbo.customers
/* Returns 91 */

select count(isnumeric('xxx')) from northwind.dbo.customers
/* Returns 91 */
But I thought the following was interesting. In the first query below, I put an illegal division-by-zero expression, and in the second, I put in a number that will obviously overflow, yet they run perfectly:

select count(1/0) from northwind.dbo.customers
/* Returns 91 */

select count(1.79e308*1.79e308) from northwind.dbo.customers
/* Returns 91 */
Again, the Optimizer recognizes that some kind of constant expression is within COUNT’s parentheses, and so it happily translates it to a COUNT(*). It doesn’t bother evaluating the expressions at all. It essentially asks, “Is it an expression involving constants?”, and if the answer is “Yes”, it just throws the expression out the window and changes it to a COUNT(*).

Now what if NULL is passed to COUNT()?:

select count(null) from northwind.dbo.customers
/*
Msg 8117, Level 16, State 1, Line 1
Operand data type void type is invalid for count operator.
*/

select count(cast(null as int)) from northwind.dbo.customers
/* Returns 0 */
In the first example, you can see that NULL all by itself is not allowed, which is interesting. COUNT() must accept a parameter that belongs to some kind of datatype.

Once we establish a datatype, as in the second example, the query produces the expected result of 0, because COUNT() does not count NULLs. If you look under the hood at the (XML) query plan, you’ll see this:

COUNT(NULL)

So it essentially translated it into a COUNT() that acts on a constant value of NULL… in other words, a COUNT(NULL).

Look at the following examples. They all get the exact same query plan as the one above:

select count(1/null) from northwind.dbo.customers
/* Returns 0 */

select count(nullif(1,1)) from northwind.dbo.customers
/* Returns 0 */

select count(char(-1)) from northwind.dbo.customers
/* Returns 0 (because CHAR(-1) returns NULL) */
In all of the above, the Optimizer actually did evaluate the expressions within COUNT()’s parentheses, and when it determined that it was equal to NULL, it put together a plan that did a COUNT(NULL).

It doesn’t necessarily evaluate the expression all the way, though. It just sees a NULL somewhere in the expression, and so it knows it has to do more work. It essentially asks, “Will this constant expression evaluate to NULL?”. If the answer is “No”, then it creates the first COUNT(*) query plan we saw at the beginning of this article. If the answer is “Yes”, then it creates the COUNT(NULL) query plan.

Consider this:

select count(isnull(null,1/0)) from northwind.dbo.customers
/* Returns 91 */
The optimizer didn’t actually come up with an actual value for the expression. It just knows that ISNULL(NULL,1/0) is the exact same thing as just 1/0, so it translated it into COUNT(1/0) which then, in turn, translated to a COUNT(*).

Now what about COUNT(DISTINCT)?

Consider the following queries:

select count(distinct 1) from northwind.dbo.customers
/* Returns 1 */

select count(distinct 'junk') from northwind.dbo.customers
/* Returns 1 */

select count(distinct sqrt(power(1234,2)*sin(pi()))) from northwind.dbo.customers
/* Returns 1 */
All of these produce the exact same query plan (you can click on the picture to see a larger version):

SELECT COUNT(DISTINCT 1)

Working from right to left, it essentially does a TOP 1 (with no sort order), then does a COUNT(*) of that, to get 1 (obviously), then does a SUM() of that, which again produces 1, and then the Compute Scalar just converts that to an integer value, and that’s the final result.

In other words, it seems that again, it doesn’t care what the expression is. It just sees that some kind of constant expression is involved and so it just spits out a vanilla TOP 1, COUNT(*), SUM(), CONVERT(int) plan.

Or does it?

Try our good old division-by-zero and arithmetic-overflow queries with COUNT(DISTINCT):

select count(distinct 1/0) from northwind.dbo.customers
/*
Msg 8134, Level 16, State 1, Line 1
Divide by zero error encountered.
*/

select count(distinct 1.79e308*1.79e308) from northwind.dbo.customers
/*
Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type float.
*/
Hmmm… So these DO produce a run-time error this time, now that we’ve added the DISTINCT keyword. If you take a look at the query plan for those 2 queries…

SELECT COUNT(DISTINCT 1/0)

… you can see that instead of the TOP operator, there is a Compute Scalar operator, which actually calculates the expression. The Compute Scalar for the first query actually calculates 1/0 and the Compute Scalar for the second query actually calculates 1.79e308*1.79e308.

So the Optimizer must actually evaluate the expression itself and see that it is an expression that is invalid in some way, and then it actually goes out of its way to create a plan that will produce a run-time error by forcing that invalid expression to be calculated.

Verrrry Interrrresting

Wednesday, September 23, 2009

A Little Mischief

Every now and then, I like to stir up a little mischief.

Sometimes I’ll go into Starbucks and order an Iced Personal Half-Caff Quad Venti Extra-Vanilla Nonfat One-Sugar Two-Honey With-Whip No-Foam Latte, just to see what kind of reaction I get. (Seriously… I really do).

The vast majority of baristas will stare back at me with a vacant expression and say, “Er… Whu-ut?” Some will just smile or laugh out loud, knowing I’m just kidding around. But on rare occasions, a barista will come right back at me without batting an eyelash and ask with all seriousness, “Do you want a Petite Reduced-Fat Cinnamon Old-Fashioned Zucchini-Walnut Muffin with that?”

They aren’t as much fun.

SQL Server is kind of like that type of barista. It’s not fun trying to get a reaction out of SQL Server. It’s hard to trip it up. The other day, for kicks, I wrote up an obnoxious, impossible-to-understand query (see below), and SQL processed it without question. Occasionally, I think it’d be kind of cool for SQL to be in on the joke and come back with a message saying, “Oh, come ON… Get real!”

/* Initialize */ 
if object_id('tempdb..#from') is not null drop table #from;
go


/* The 'mischief' query: */

with [top] as (select [top]=100),[where](

[select],[top100],[percent],[into],[order],
[from],[employees],
[over],[partition],[by],[column],
[go]
)as(

select EmployeeID,null,FirstName
,LastName,ReportsTo,City
,HireDate,1,1,5,null,Title
from Northwind.dbo.Employees)

select top (select [top] from [top])
percent [percent] as [as], [sum]=sum ([over]) over(
partition by [partition],[by]),
[into] into #from from [where] where
[order]
=[by] order by [order],[by],
[go];
go

Monday, September 21, 2009

T-SQL Fu: The Story Continues

Master Po and Kwai Chang Caine from the 1970's TV series Kung FuThis is a continuation of my last blog post, in which I mentioned that the online T-SQL Forums are like temples where young novices learn the mysteries of T-SQL under the patient guidance of a Master, reminiscent of the 1970’s TV series Kung Fu.

I wanted to provide a way for novices to get up to speed a little more quickly by revealing many of the mysteries of T-SQL all at once so they don't have to stumble at the beginning.

And so, continuing our story…

We left off with a fateful morning, when our young novice asked about his training in the mysterious art of T-SQL Fu…



Master, is my training almost over? When will I reach my full potential in T-SQL Fu?

Grasshopper, you still have much to learn. Quickly as you can, snatch this pebble from my hand. (The young novice tries and fails.) When you can take the pebble from my hand, it will be time for you to leave.



Time continues its irreversible march forward, and the training and questions continue…



Master, the ISNULL() function is cutting off my string. What’s going on?

declare @c char(1) 
set @c=null
select isnull(@c,'12345')
/* Returns '1' */
Grasshopper, the result of the ISNULL() function is always the exact same datatype as the first argument, which most people discover by accident because they don’t read the documentation. I would suggest using COALESCE() instead since it is an ANSI standard function, whereas ISNULL() is not. Also, COALESCE() will take multiple parameters:

declare @c char(1) 
set @c=null
select coalesce(@c,null,reverse(@c),null-null,'12345')
/* Returns '12345' */



Master, my data is getting cut off again. And why is it getting cut off at different points?

declare @s varchar 
set
@s='ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890'
select @s
/* Returns 'A' */

select cast('ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890' as varchar)
/* Returns 'ABCDEFGHIJKLMNOPQRSTUVWXYZ1234' */
Grasshopper, you did not stipulate the number of characters in your VARCHAR declarations. For whatever reason, the CONVERT() and CAST() functions assume 30 bytes, but DECLARE assumes only 1 byte. The following XML example also implies only 1 byte:

declare @x xml 
set
@x='<data>ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890</data>'
select @x.value('data[1]','varchar')
/* Returns 'A' */
The lesson here is that you should always explicitly specify the number of characters in a VARCHAR or NVARCHAR declaration.



Master, this is clearly a bug. Everyone knows that all years divisible by 4 are leap years!

declare @LeapDay2000 datetime 
set
@LeapDay2000='2000-02-29'
select LeapDay1900=dateadd(year,-100,@LeapDay2000)
,LeapDay2100=dateadd(year,+100,@LeapDay2000)
/*
LeapDay1900 LeapDay2100
----------------------- -----------------------
1900-02-28 00:00:00.000 2100-02-28 00:00:00.000
*/
Grasshopper, I am afraid you are mistaken. All years divisible by 4 are leap years EXCEPT for years divisible by 100; HOWEVER, years that are divisible by 400 (like 2000) ARE leap years. I am thankful that I will not be alive in 2100 to see the confusion that arises when people with “every 4 years” drilled into their heads insist that there should be a 29th of February.

So be assured that T-SQL knows what it is doing. However, you may be amused to know that Excel does NOT know what it’s doing. It thinks that 1900 was a leap year, as you can see below, where I subtract 1 from March 1, 1900:

Excel mistakenly thinks that 1900 was a leap year

Again, there was no such date as February 29, 1900. Interestingly, though, Excel does seem to be aware that 2100 will not be a leap year. As you young people say, “Go figure.”



Master, my system must be corrupt. It says that it cannot convert the date, but everyone knows that a date string in YYYY-MM-DD format is universally accepted.

select convert(datetime,'2009-09-15') 
/*
Msg 242, Level 16, State 3, Line 2
The conversion of a char data type to a datetime data type
resulted in an out-of-range datetime value.
*/
Grasshopper, I’m afraid you are incorrect. When you stepped away from your computer to go out for a Jamba Juice, I secretly typed SET LANGUAGE BRITISH in your SSMS session. The ANSI SQL format of YYYY-MM-DD is NOT language independent. It will not work in language settings that put day before month. The only sure-fire language-independent way of using date strings is to enter them as an 8-character string with no separators in YYYYMMDD format, or the more clumsy ODBC or ISO8601 formats:

select convert(datetime,'20090915') 
select convert(datetime,{d '2009-09-15'})
select convert(datetime,'2009-09-15T00:00:00')



Many weeks later, our young novice is pulling the soggy earbuds of his iPod out of his morning cup of Green Tea as his Master approaches…



Master, this query keeps bombing. It doesn’t make sense, because shouldn’t T-SQL evaluate the left-most ISNUMERIC() function first and, if false, not even bother to do the CONVERT()? All programming languages do that!

select CustomerID,CompanyName,PostalCode 
from Northwind.dbo.Customers
where isnumeric(PostalCode)=1 and convert(int,PostalCode)>99000
/*
Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the nvarchar value 'WA1 1DP' to data type int.
*/
Grasshopper, T-SQL works differently from other languages. It does not always evaluate WHERE predicates from left to right. It is free to evaluate them in any order that it sees as the most cost-effective. In this case, it considers the CONVERT() function to be more cost-effective than the ISNUMERIC() function and so it does the CONVERT() predicate first. But note the following example:

select CustomerID,CompanyName,PostalCode 
from Northwind.dbo.Customers
where Country='USA' and convert(int,PostalCode)>99000
/*
CustomerID CompanyName PostalCode
---------- ----------------------- ----------
LAZYK Lazy K Kountry Store 99362
OLDWO Old World Delicatessen 99508
*/
This query completed successfully. In this case it evaluated the Country=’USA’ predicate first, because it is more cost-effective than the CONVERT() predicate. Note that even though it scanned all the other customers, it ONLY evaluated the CONVERT() predicate AFTER it determined that the Country was ‘USA’.

If one of your predicates is dependent on the other, as was your case with ISNUMERIC() and CONVERT(), then the safest way to construct the query is with a CASE statement:

select CustomerID,CompanyName,PostalCode 
from Northwind.dbo.Customers
where case
when isnumeric(PostalCode)=1
then convert(int,PostalCode)
else -1
end>99000
/*
CustomerID CompanyName PostalCode
---------- ----------------------- ----------
LAZYK Lazy K Kountry Store 99362
OLDWO Old World Delicatessen 99508
*/



Master, I don’t understand why this query is behaving this way. I used a LEFT JOIN because I wanted ALL employees to come out regardless of whether they have Orders with Freight over $800 or not. But it’s not producing ALL employees. Doesn’t a LEFT JOIN guarantee that?

select e.EmployeeID,LastName,OrderID,Freight 
from Northwind.dbo.Employees e
left join Northwind.dbo.Orders o on e.EmployeeID=o.EmployeeID
where o.Freight>800
order by e.EmployeeID
/*
EmployeeID LastName OrderID Freight
----------- ---------- -------- --------
2 Fuller 10691 810.05
3 Leverling 10540 1007.64
5 Buchanan 10372 890.78
7 King 11030 830.75
*/
Grasshopper, the query IS doing a LEFT JOIN in the manner you expect, and it is creating an intermediate set of ALL employees with their orders (if any). However, it is your WHERE clause that is effectively filtering out your employees with no orders at all. Remember this important rule: If you’re doing a LEFT JOIN, do not refer to any of the right-hand-side’s fields in the WHERE clause. Instead, put the condition in the ON clause of the JOIN itself:

select e.EmployeeID,LastName,OrderID,Freight 
from Northwind.dbo.Employees e
left join Northwind.dbo.Orders o on e.EmployeeID=o.EmployeeID
and o.Freight>800
order by e.EmployeeID
/*
EmployeeID LastName OrderID Freight
----------- ---------- -------- --------
1 Davolio NULL NULL
2 Fuller 10691 810.05
3 Leverling 10540 1007.64
4 Peacock NULL NULL
5 Buchanan 10372 890.78
6 Suyama NULL NULL
7 King 11030 830.75
8 Callahan NULL NULL
9 Dodsworth NULL NULL
*/
The only exception to this rule is if you’re specifically checking for NULLs in the right-hand side, as in this example, which finds customers with no orders at all:

select c.CustomerID 
from Northwind.dbo.Customers c
left join Northwind.dbo.Orders o on c.CustomerID=o.CustomerID
where o.CustomerID is null
/*
CustomerID
----------
FISSA
PARIS
*/



The days melt into weeks, and the passing of many moons beget years… until one day…



The young novice comments on a strange man that wanders the temples and monasteriesMaster, there is a man who wanders about this temple and other monasteries, talking in a rude manner. He tells me that I have many design flaws in my data and that I’m performing operations that should be done in the application layer instead. He tells me I’m using “hillbilly” naming conventions instead of ISO 11179-5 metadata standards, and that IDENTITY keys are not valid because they are meaningless “Kabbalah” numbers with none of the relational properties of a proper key. He told me that I shouldn’t do attribute splitting and that Date and Time are an atomic component representing a temporal dimension as an interval. He also says the ROW_NUMBER() function is a disastrous sequential operation that… Master? MASTER?

Grasshopper, I am sorry. I didn’t mean to nod off. (He straightens his robes). I know of this man. What is your question?

Master, I don’t have one. I just wanted to talk about him. I think that he just goes out of his way to taunt me and others. If you filter out the inappropriate references to “hillbilly” and “Kabbalah”, perhaps small parts of some of the things he says have merit in a dry textbook-like theoretical sense, but he doesn’t seem to be living in the real world. I just ignore him and pretend he’s not there.

Grasshopper, I think you may have reached enlightenment.

Master, I am honored. Please look at this T-SQL Script:

select Pebble 
into Grasshoppers.Hand
from Masters.Hand
option (fast 1)
Grasshopper, I am most proud. You are ready to leave. Remember always that a wise man walks with his head bowed, humble like the dust.



And so, our young hero, once a novice, now a T-SQL Fu Master, walks off into the sunset with head bowed. He has mastered the art of T-SQL Fu, but his training is not yet finished. There is much work ahead in studying SSRS Fu and SSIS Fu and SSAS Fu and BI Fu and MDX Fu and…

A new T-SQL Fu Master wanders on to his next adventure

Friday, September 18, 2009

T-SQL Fu: The Path To Understanding

Do you remember the old 1970’s TV series Kung Fu? It’s the story of Kwai Chang Caine, a half-American half-Chinese Shaolin priest on the run in 1870’s America, sprinkled with flashbacks to his training with Master Po and Master Kan in the Shaolin temple back in China.

The various T-SQL forums available on the web kind of remind me of those flashbacks… they are like temples or monasteries where young novices can ask questions and learn the quirky idiosyncrasies of T-SQL under the patient guidance of a Master.

These forums are full of threads with simple common questions that get asked over and over again. It would be most helpful if novices could get up to speed quickly by having some of the mysteries of T-SQL opened up to them all at once so that they don’t have to stumble at the beginning… A way to begin to unlock the mysteries of the art of T-SQL Fu…



It is the dawn of a new day, and as the morning sunlight filters in through the high temple window…



Master, these queries are giving the wrong results! It’s quite obvious what I’m trying to accomplish in these queries. Is the server corrupt or something?

select Result = 3+4 * 5+6 
/*
Result
-------
29
*/

select EmployeeID,TitleOfCourtesy,LastName,City
from Northwind.dbo.Employees
where City='Seattle' or City='Tacoma'
and TitleOfCourtesy='Dr.'
/*
EmployeeID TitleOfCourtesy LastName City
----------- ---------------- --------- --------
1 Ms. Davolio Seattle
2 Dr. Fuller Tacoma
8 Ms. Callahan Seattle
*/

select Item = 'A' union select 'B' union select 'C'
intersect
select
'C'
/*
Item
----
A
B
C
*/
Grasshopper, there is no corruption. All of these examples are giving the correct results. You formatted your queries according to your intentions, but T-SQL doesn’t care about formatting. It cares about an order of operations that it must follow. Multiplication and Division take precedence over Addition and Subtraction and therefore they are evaluated first. AND takes precedence over OR. And INTERSECT takes precedence over UNION and EXCEPT. Use parentheses to arrive at the outcome you’re intending:

select Result = (3+4) * (5+6) 
/*
Result
-------
77
*/


select EmployeeID,TitleOfCourtesy,LastName,City
from Northwind.dbo.Employees
where (City='Seattle' or City='Tacoma')
and TitleOfCourtesy='Dr.'
/*
EmployeeID TitleOfCourtesy LastName City
----------- ---------------- --------- --------
2 Dr. Fuller Tacoma
*/


(
select Item = 'A' union select 'B' union select 'C')
intersect
select
'C'
/*
Item
----
C
*/


Master, the following query couldn’t be simpler. Why is it not returning 0.5?

select Result = 1/2  
/* Returns 0 */
Grasshopper, in this query, T-SQL sees a 1 and a 2, which are both integers, so it performs integer arithmetic, producing an integer result with no remainder and no decimals. Therefore 1/2 evaluates to 0. Add a decimal point to force it to use decimal arithmetic instead:

select Result = 1./2  
/* Returns 0.500000 */


Master, you just got through telling me that T-SQL will divide two integers using integer arithmetic. It’s not doing it with this query:

select 2147483649/2  
/* Returns 1073741824.500000 */
Grasshopper, that first number is not an integer. Books Online states that integers range from -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647). Your number exceeds the range, so T-SQL interprets it as a decimal instead.



Master, I confirmed that Books Online states that the highest integer is 2^31-1. But when I try that number in a query, it gives me a strange answer:

select 2^31-1  
/* Returns 28 */
Grasshopper, I’m afraid that Books Online uses an unfortunate syntax. In many programming languages, the carat (^) is used for exponentiation. But in T-SQL, the carat indicates a Bitwise Exclusive Or (XOR) operation. The bits in the result are set to 1 if either bit (but not both bits) in the operands are equal to 1. To illustrate:

select Result = 2^31 
/*
Bit representation of 2: 00010
Bit representation of 31: 11111
--------------------------------
Exclusive Or (XOR) Result: 11101 = 29
*/
In fact, there are other bitwise operators in T-SQL like an ampersand (&) for AND, a vertical line (|) for (non-exclusive) OR, and a tilde (~) for NOT. Consider, for example, the following query:

select ~5&6^7|8 
/*
Bit representation of 5: 0101
-------------------------------
Result of ~5 1010
Bit representation of 6: 0110
-------------------------------
Result of ~5&6 0010
Bit representation of 7 0111
-------------------------------
Result of ~5&6^7 0101
Bit representation of 8 1000
-------------------------------
Result of ~5&6^7|8 1101 = 13
*/
But back to your original question… If you want to use exponentiation in T-SQL, use the POWER() function.



Master, something strange is going on here. I queried for rows with UnitPrice over $2.00, but I’m getting unexpected results.

select OrderID,ProductID,UnitPrice 
from Northwind..[Order Details]
where UnitPrice! > 2
/*
OrderID ProductID UnitPrice
-------- ---------- ----------
10252 33 2.00
10269 33 2.00
10271 33 2.00
10273 33 2.00
10341 33 2.00
10382 33 2.00
10410 33 2.00
10414 33 2.00
10415 33 2.00
10454 33 2.00
10473 33 2.00
*/
Grasshopper, you are not very observant this morning, despite the 2 Venti Mocha Frappuccinos that you guzzled. Note the exclamation mark, which you evidently typed in error under the influence of too much caffeine. This implies NOT when used with a comparison operator. So you are effectively asking for rows with UnitPrice NOT greater than $2.00. The comparison operators of !< and !> are not used very often… most people prefer using the >= and <= operators to accomplish the same thing.



The days of training continue… And there are still many questions… One bitterly-cold winter morning, our young novice is fruitlessly trying to scratch candle wax off his laptop keyboard as his Master approaches…



Master, I found a bug in T-SQL. The comparisons I’m doing in the examples below are clearly equal, but T-SQL thinks they are not.

declare @f float 
set
@f=0.1
if @f*@f = 0.1*0.1
print 'Ok'
else print 'Huh?'
/* Prints 'Huh?' */

declare @a float, @b float
select
@a=14.58, @b=0.54
if @a+@b = 14.58+0.54
print 'Ok'
else print 'Huh?'
/* Prints 'Huh?' */
Grasshopper, the datatypes of FLOAT and REAL are known as “Approximate Numerics”. Not all values (like 0.1 for example) can be represented exactly in those formats. If you want to learn more about the inaccuracies of floating point math, you can do a Google search.

In your first example, you are multiplying two floating-point numbers (@f*@f) together and you are multiplying two decimals (0.1*0.1 = 0.01) together. T-SQL will take that decimal result of 0.01 and convert it to floating-point in order to do the comparison. But because of floating-point inaccuracies, a floating-point 0.1 multiplied by a floating-point 0.1 does not equal a floating-point 0.01. It is a similar situation with your second example.

Please observe:

select cast(0.1 as float)*cast(0.1 as float) - cast(0.01 as float) 
/* Returns 1.73472347597681E-18 */

select cast(14.58 as float)+cast(0.54 as float) - cast(14.58+0.54 as float)
/* Returns 1.77635683940025E-15 */
Use the datatypes of NUMERIC or DECIMAL for mathematical calculations that produce exact results.



Master, I don’t understand why the following are giving me different results. Is floating point inaccuracy coming into play again somehow?

if 3 * 1./3 = 1 
print 'Ok'
else print 'Huh?'
/* Prints 'Ok' */

if 1./3 * 3 = 1
print 'Ok'
else print 'Huh?'
/* Prints 'Huh?' */
Grasshopper, each of these two queries evaluate their expressions from left to right. The first query multiplied 3 times 1., coming up with 3.000000, and then it divided that by 3, coming up with 1.000000. The second query divided 1. by 3, coming up with 0.333333, and then multiplied that by 3, coming up with 0.999999. That intermediate result of 0.333333 is only an approximation of a repeating decimal where the digit of 3 goes on forever. I would suggest using the ROUND() function.



Master, why do the following 2 queries have different outcomes? I thought that T-SQL implicitly converted characters and strings to integers when there are expressions that mix the two.

select Result = 2 + '3' + '4' 
/* Returns 9 */

select Result = '3' + '4' + 2
/* Returns 36 */
Grasshopper, again, the queries are being evaluated left to right. The second query sees two characters as the first two operands (‘3’ and ‘4’) and so it interprets the plus sign (+) as a concatenation and therefore comes up with an intermediate result of a string (‘34’). Finally, that string is converted to an integer before adding 2 to come up with the result of 36.



Master, this is really weird. Why does T-SQL think that 1 is equal to NULL?

if 1<>null 
print 'Good... 1 is not equal to NULL'
else print 'HUH? T-SQL THINKS 1 IS EQUAL TO NULL!'
/* Prints 'HUH? T-SQL THINKS 1 IS EQUAL TO NULL!' */
Grasshopper, any expression involving NULL, whether it be comparative or boolean or arithmetic, evaluates to NULL. And so, in your IF condition, the comparison of 1<>NULL evaluates to NULL, but an IF statements needs a TRUE/FALSE, so T-SQL evaluates the NULL to FALSE, and that is why the ELSE statement executed. When comparing to NULL, use IS NULL or IS NOT NULL. Don’t ever use =NULL or <>NULL.



Master, the length of the following string is clearly 10. What’s going on?

select len('12345'+space(5))   
/* Returns 5 */
Grasshopper, the LEN() function ignores trailing blanks. The DATALENGTH() function will give you the true number of bytes. But note that I said bytes rather than characters. You must realize that VARCHAR characters are 1 byte each while NVARCHAR characters are 2 bytes each. Observe…

declare @vchar varchar(20) 
,@nvchar nvarchar(20)
set @vchar='12345'+space(5)
set @nvchar='12345'+space(5)
select datalength(@vchar),datalength(@nvchar)
/* Returns 10 and 20 */
So you’ll have to remember to divide by 2 to get the number of characters in an NVARCHAR string. Either that, or you can use a more generalized method that works for both VARCHAR and NVARCHAR:

declare @vchar varchar(20) 
,@nvchar nvarchar(20)
set @vchar='12345'+space(5)
set @nvchar='12345'+space(5)
select len(@vchar+'*')-1, len(@nvchar+'*')-1
/* Returns 10 and 10 */




The first monsoon season passes, and then, one fateful day…



Master, is my training almost over? When will I reach my full potential in T-SQL Fu?

Grasshopper, you still have much to learn. Quickly as you can, snatch this pebble from my hand. (The young novice tries and fails.) When you can take the pebble from my hand, it will be time for you to leave.



To be continued… Click here to read the next installment.