Friday, September 18, 2009

T-SQL Fu: The Path To Understanding

Do you remember the old 1970’s TV series Kung Fu? It’s the story of Kwai Chang Caine, a half-American half-Chinese Shaolin priest on the run in 1870’s America, sprinkled with flashbacks to his training with Master Po and Master Kan in the Shaolin temple back in China.

The various T-SQL forums available on the web kind of remind me of those flashbacks… they are like temples or monasteries where young novices can ask questions and learn the quirky idiosyncrasies of T-SQL under the patient guidance of a Master.

These forums are full of threads with simple common questions that get asked over and over again. It would be most helpful if novices could get up to speed quickly by having some of the mysteries of T-SQL opened up to them all at once so that they don’t have to stumble at the beginning… A way to begin to unlock the mysteries of the art of T-SQL Fu…



It is the dawn of a new day, and as the morning sunlight filters in through the high temple window…



Master, these queries are giving the wrong results! It’s quite obvious what I’m trying to accomplish in these queries. Is the server corrupt or something?

select Result = 3+4 * 5+6 
/*
Result
-------
29
*/

select EmployeeID,TitleOfCourtesy,LastName,City
from Northwind.dbo.Employees
where City='Seattle' or City='Tacoma'
and TitleOfCourtesy='Dr.'
/*
EmployeeID TitleOfCourtesy LastName City
----------- ---------------- --------- --------
1 Ms. Davolio Seattle
2 Dr. Fuller Tacoma
8 Ms. Callahan Seattle
*/

select Item = 'A' union select 'B' union select 'C'
intersect
select
'C'
/*
Item
----
A
B
C
*/
Grasshopper, there is no corruption. All of these examples are giving the correct results. You formatted your queries according to your intentions, but T-SQL doesn’t care about formatting. It cares about an order of operations that it must follow. Multiplication and Division take precedence over Addition and Subtraction and therefore they are evaluated first. AND takes precedence over OR. And INTERSECT takes precedence over UNION and EXCEPT. Use parentheses to arrive at the outcome you’re intending:

select Result = (3+4) * (5+6) 
/*
Result
-------
77
*/


select EmployeeID,TitleOfCourtesy,LastName,City
from Northwind.dbo.Employees
where (City='Seattle' or City='Tacoma')
and TitleOfCourtesy='Dr.'
/*
EmployeeID TitleOfCourtesy LastName City
----------- ---------------- --------- --------
2 Dr. Fuller Tacoma
*/


(
select Item = 'A' union select 'B' union select 'C')
intersect
select
'C'
/*
Item
----
C
*/


Master, the following query couldn’t be simpler. Why is it not returning 0.5?

select Result = 1/2  
/* Returns 0 */
Grasshopper, in this query, T-SQL sees a 1 and a 2, which are both integers, so it performs integer arithmetic, producing an integer result with no remainder and no decimals. Therefore 1/2 evaluates to 0. Add a decimal point to force it to use decimal arithmetic instead:

select Result = 1./2  
/* Returns 0.500000 */


Master, you just got through telling me that T-SQL will divide two integers using integer arithmetic. It’s not doing it with this query:

select 2147483649/2  
/* Returns 1073741824.500000 */
Grasshopper, that first number is not an integer. Books Online states that integers range from -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647). Your number exceeds the range, so T-SQL interprets it as a decimal instead.



Master, I confirmed that Books Online states that the highest integer is 2^31-1. But when I try that number in a query, it gives me a strange answer:

select 2^31-1  
/* Returns 28 */
Grasshopper, I’m afraid that Books Online uses an unfortunate syntax. In many programming languages, the carat (^) is used for exponentiation. But in T-SQL, the carat indicates a Bitwise Exclusive Or (XOR) operation. The bits in the result are set to 1 if either bit (but not both bits) in the operands are equal to 1. To illustrate:

select Result = 2^31 
/*
Bit representation of 2: 00010
Bit representation of 31: 11111
--------------------------------
Exclusive Or (XOR) Result: 11101 = 29
*/
In fact, there are other bitwise operators in T-SQL like an ampersand (&) for AND, a vertical line (|) for (non-exclusive) OR, and a tilde (~) for NOT. Consider, for example, the following query:

select ~5&6^7|8 
/*
Bit representation of 5: 0101
-------------------------------
Result of ~5 1010
Bit representation of 6: 0110
-------------------------------
Result of ~5&6 0010
Bit representation of 7 0111
-------------------------------
Result of ~5&6^7 0101
Bit representation of 8 1000
-------------------------------
Result of ~5&6^7|8 1101 = 13
*/
But back to your original question… If you want to use exponentiation in T-SQL, use the POWER() function.



Master, something strange is going on here. I queried for rows with UnitPrice over $2.00, but I’m getting unexpected results.

select OrderID,ProductID,UnitPrice 
from Northwind..[Order Details]
where UnitPrice! > 2
/*
OrderID ProductID UnitPrice
-------- ---------- ----------
10252 33 2.00
10269 33 2.00
10271 33 2.00
10273 33 2.00
10341 33 2.00
10382 33 2.00
10410 33 2.00
10414 33 2.00
10415 33 2.00
10454 33 2.00
10473 33 2.00
*/
Grasshopper, you are not very observant this morning, despite the 2 Venti Mocha Frappuccinos that you guzzled. Note the exclamation mark, which you evidently typed in error under the influence of too much caffeine. This implies NOT when used with a comparison operator. So you are effectively asking for rows with UnitPrice NOT greater than $2.00. The comparison operators of !< and !> are not used very often… most people prefer using the >= and <= operators to accomplish the same thing.



The days of training continue… And there are still many questions… One bitterly-cold winter morning, our young novice is fruitlessly trying to scratch candle wax off his laptop keyboard as his Master approaches…



Master, I found a bug in T-SQL. The comparisons I’m doing in the examples below are clearly equal, but T-SQL thinks they are not.

declare @f float 
set
@f=0.1
if @f*@f = 0.1*0.1
print 'Ok'
else print 'Huh?'
/* Prints 'Huh?' */

declare @a float, @b float
select
@a=14.58, @b=0.54
if @a+@b = 14.58+0.54
print 'Ok'
else print 'Huh?'
/* Prints 'Huh?' */
Grasshopper, the datatypes of FLOAT and REAL are known as “Approximate Numerics”. Not all values (like 0.1 for example) can be represented exactly in those formats. If you want to learn more about the inaccuracies of floating point math, you can do a Google search.

In your first example, you are multiplying two floating-point numbers (@f*@f) together and you are multiplying two decimals (0.1*0.1 = 0.01) together. T-SQL will take that decimal result of 0.01 and convert it to floating-point in order to do the comparison. But because of floating-point inaccuracies, a floating-point 0.1 multiplied by a floating-point 0.1 does not equal a floating-point 0.01. It is a similar situation with your second example.

Please observe:

select cast(0.1 as float)*cast(0.1 as float) - cast(0.01 as float) 
/* Returns 1.73472347597681E-18 */

select cast(14.58 as float)+cast(0.54 as float) - cast(14.58+0.54 as float)
/* Returns 1.77635683940025E-15 */
Use the datatypes of NUMERIC or DECIMAL for mathematical calculations that produce exact results.



Master, I don’t understand why the following are giving me different results. Is floating point inaccuracy coming into play again somehow?

if 3 * 1./3 = 1 
print 'Ok'
else print 'Huh?'
/* Prints 'Ok' */

if 1./3 * 3 = 1
print 'Ok'
else print 'Huh?'
/* Prints 'Huh?' */
Grasshopper, each of these two queries evaluate their expressions from left to right. The first query multiplied 3 times 1., coming up with 3.000000, and then it divided that by 3, coming up with 1.000000. The second query divided 1. by 3, coming up with 0.333333, and then multiplied that by 3, coming up with 0.999999. That intermediate result of 0.333333 is only an approximation of a repeating decimal where the digit of 3 goes on forever. I would suggest using the ROUND() function.



Master, why do the following 2 queries have different outcomes? I thought that T-SQL implicitly converted characters and strings to integers when there are expressions that mix the two.

select Result = 2 + '3' + '4' 
/* Returns 9 */

select Result = '3' + '4' + 2
/* Returns 36 */
Grasshopper, again, the queries are being evaluated left to right. The second query sees two characters as the first two operands (‘3’ and ‘4’) and so it interprets the plus sign (+) as a concatenation and therefore comes up with an intermediate result of a string (‘34’). Finally, that string is converted to an integer before adding 2 to come up with the result of 36.



Master, this is really weird. Why does T-SQL think that 1 is equal to NULL?

if 1<>null 
print 'Good... 1 is not equal to NULL'
else print 'HUH? T-SQL THINKS 1 IS EQUAL TO NULL!'
/* Prints 'HUH? T-SQL THINKS 1 IS EQUAL TO NULL!' */
Grasshopper, any expression involving NULL, whether it be comparative or boolean or arithmetic, evaluates to NULL. And so, in your IF condition, the comparison of 1<>NULL evaluates to NULL, but an IF statements needs a TRUE/FALSE, so T-SQL evaluates the NULL to FALSE, and that is why the ELSE statement executed. When comparing to NULL, use IS NULL or IS NOT NULL. Don’t ever use =NULL or <>NULL.



Master, the length of the following string is clearly 10. What’s going on?

select len('12345'+space(5))   
/* Returns 5 */
Grasshopper, the LEN() function ignores trailing blanks. The DATALENGTH() function will give you the true number of bytes. But note that I said bytes rather than characters. You must realize that VARCHAR characters are 1 byte each while NVARCHAR characters are 2 bytes each. Observe…

declare @vchar varchar(20) 
,@nvchar nvarchar(20)
set @vchar='12345'+space(5)
set @nvchar='12345'+space(5)
select datalength(@vchar),datalength(@nvchar)
/* Returns 10 and 20 */
So you’ll have to remember to divide by 2 to get the number of characters in an NVARCHAR string. Either that, or you can use a more generalized method that works for both VARCHAR and NVARCHAR:

declare @vchar varchar(20) 
,@nvchar nvarchar(20)
set @vchar='12345'+space(5)
set @nvchar='12345'+space(5)
select len(@vchar+'*')-1, len(@nvchar+'*')-1
/* Returns 10 and 10 */




The first monsoon season passes, and then, one fateful day…



Master, is my training almost over? When will I reach my full potential in T-SQL Fu?

Grasshopper, you still have much to learn. Quickly as you can, snatch this pebble from my hand. (The young novice tries and fails.) When you can take the pebble from my hand, it will be time for you to leave.



To be continued… Click here to read the next installment.

7 comments:

  1. Brad,

    Enjoyed the narrative. You have a unique style of presenting.

    ReplyDelete
  2. This is just awesome!!!! I had a smile on my face the entire time and LOL a few times. Very entertaining. Thanks for the post Brad.

    ReplyDelete
  3. Sankar and Adam...

    Thanks for the positive feedback. Sometimes I'm not sure if I go too far with my thirst for creativity.

    I guarantee that you guys (and anybody else from the MSDN T-SQL forum) will really enjoy the second part (and conclusion) of this... it'll post on Monday or Tuesday.

    Thanks again!

    --Brad

    ReplyDelete
  4. Have to echo the above comments, you have a very clear and amusing presenting format. Also many annoyances which I have suffered with before are covered with concise, clear explanations.

    A great blog entry!

    ReplyDelete
  5. Thank you so much for the feedback... I'm so glad you enjoyed it!

    --Brad

    ReplyDelete
  6. wonderful information, I had come to know about your blog from my friend nandu , hyderabad,i have read atleast 7 posts of yours by now, and let me tell you, your website gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanks a ton once again, Regards, Difference sql and tsql

    ReplyDelete