SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server(): November 2009

Friday, November 20, 2009

A Loan At Last!

Got your eye on a flashy new car?

Or perhaps you’re considering a new home?

If you don’t have the cash to pay for either of these, you’ll need to get a loan.

And before you jump in head-first, you’ll want to know how much you’ll be paying each month, and perhaps you’ll want to know ~~the obscene amount of~~ how much interest you’ll end up paying over the life of the loan.

Sure, there are truckloads of loan calculators out there on the web… They’re a dime a dozen.

But who needs those when you have the raw power of SQL Server at your fingertips?

This blog entry will demonstrate how to create a loan payment schedule based on a fixed interest rate with compounded interest, and, as always, we’ll go a little bit beyond that to explore some other things.

The monthly payment amount, c, is expressed by the following formula:

where P is the loan amount, M is the number of months of the loan, and r is the monthly interest rate expressed as a decimal.

The balance of the loan after payment number N is expressed by the following formula:

With all this in mind, here is a stored procedure to produce a payment schedule:

use tempdb;
go
  
if object_id('usp_LoanSchedule') is not null drop procedure usp_LoanSchedule;
go
  
create procedure usp_LoanSchedule
   @LoanAmount     numeric(10,2)
  ,@AnnualRate     numeric(10,8)  /* Expressed as percent */
  ,@NumberOfMonths int
  ,@StartDate      datetime
as
with InputVariables as
(
  select P,M,R
        ,C=round((P*R)/(1-power(1+R,-M)),2)
  from (select P=@LoanAmount
              ,M=@NumberOfMonths
              ,R=@AnnualRate/12/100) InputData
)
,MonthlyPayments(PmtNo
                ,PmtDate
                ,Balance
                ,Principle
                ,Interest
                ,CumulPrinciple
                ,CumulInterest) as
(
  select N
        ,dateadd(month,datediff(month,'19000101',@StartDate),'19000101')
        ,cast(NewBalance as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
  from InputVariables
  cross apply (select N=1) CalcPmtNo
  cross apply (select NewBalance=round(P*power(1+R,N)
                                      -(power(1+R,N)-1)*C/R,2)) CalcNewBalance
  union all
  select N
        ,dateadd(month,1,mp.PmtDate)
        ,cast(NewBalance as numeric(10,2))
        ,cast(mp.Balance-NewBalance as numeric(10,2))
        ,cast(C-(mp.Balance-NewBalance) as numeric(10,2))
        ,cast(mp.CumulPrinciple+mp.Balance-NewBalance as numeric(10,2))
        ,cast(mp.CumulInterest+C-(mp.Balance-NewBalance) as numeric(10,2))
  from MonthlyPayments mp
  cross join InputVariables
  cross apply (select N=mp.PmtNo+1) CalcPmtNo
  cross apply (select NewBalance=case
                                   when N=M
                                   then 0.00  /* Last Payment */
                                   else round(P*power(1+R,N)
                                             -(power(1+R,N)-1)*C/R,2)
                                 end) CalcNewBalance
  where N<=M
)
select PmtNo
      ,PmtMonth=datename(month,PmtDate)+str(year(PmtDate),5)
      ,PmtAmount=Principle+Interest
      ,Principle
      ,Interest
      ,CumulPrinciple
      ,CumulInterest
      ,Balance
from MonthlyPayments  
order by PmtNo
option (maxrecursion 1000)
;

If we want to buy a $20,000 car at 6.8% (annual) interest with a 5-year term (60 months) with our first payment in December 2009, then we can call it like so:

exec usp_LoanSchedule @LoanAmount     = 20000
                     ,@AnnualRate     = 6.8
                     ,@NumberOfMonths = 60
                     ,@StartDate      = '20091201'
/*
PmtNo PmtMonth       PmtAmount Principle Interest CumulPrinciple CumulInterest  Balance
----- -------------- --------- --------- -------- -------------- ------------- --------
    1 December 2009     394.14    280.81   113.33         280.81        113.33 19719.19
    2 January 2010      394.14    282.39   111.75         563.20        225.08 19436.80
    3 February 2010     394.14    284.00   110.14         847.20        335.22 19152.80
    4 March 2010        394.14    285.61   108.53        1132.81        443.75 18867.19
  ... ...                 ...       ...      ...            ...           ...      ...
   57 August 2014       394.14    385.33     8.81       18830.89       3635.09  1169.11
   58 September 2014    394.14    387.52     6.62       19218.41       3641.71   781.59
   59 October 2014      394.14    389.71     4.43       19608.12       3646.14   391.88
   60 November 2014     394.14    391.88     2.26       20000.00       3648.40     0.00
*/

Let’s go over how the logic works.

The procedure essentially consists of a single SELECT statement, which is made up of two CTE’s (Common Table Expressions).

The first CTE is called InputVariables, and it just produces a small one-row table of the variables that we need for the calculations… in other words, our friends P, M, r, and c. Note how r is calculated by dividing the @AnnualRate parameter by 12 (to get a monthly rate) and then by 100 (to represent the rate as a decimal). Also note how c (the monthly payment amount) was calculated based on the other 3 variables.

I could have just created local variables (@P, @M, @R, and @C) instead of setting them up in a CTE, but I had two reasons for not doing that. First of all, I didn’t want all those pesky at-signs (@) to be all throughout the code and second, I wanted to encapsulate the entire creation of the loan schedule table in a single SELECT statement (I’ll explain why later).

The second CTE, called MonthlyPayments, is a recursive one. The first part of the recursive CTE (before the UNION ALL) is the anchor. It produces the data for the very first payment (note how I also adjust the first payment date to the first of the month). The second part (the recursive part) of the CTE builds upon that, creating the second payment, then the third, and so on, up until the last payment. It will continue tacking on payments as long as its WHERE clause (WHERE N<=M) is true… in other words, as long as the Payment Number (N) is less than or equal to the number of months of the loan (M). Once the WHERE clause evaluates to false (when N>M), then it will stop.

Both the anchor and the recursive part of the CTE use a couple of CROSS APPLYs. The first CROSS APPLY establishes a new column N, the Payment Number. The only reason I do this is because I want to use N in the calculation of the Balance. You’ll recall that the formula for the Balance uses N as one of its variables.

The second CROSS APPLY is the calculation of that NewBalance, making use of P, r, N, and c. Note the CASE statement that checks if we are processing the final payment (CASE WHEN N=M) and, if so, it forces the final balance to be zero.

The Principle of the payment is calculated by subtracting the NewBalance from the Balance of the previous payment (or in the case of the very first payment, by subtracting NewBalance from the original loan amount P). The Interest part of the payment is calculated by simply subtracting the Principle from the Payment Amount (c).

Since the recursive CTE is building the schedule one payment at a time, we can take advantage of that and calculate cumulative totals for the Principle and the Interest (CumulPrinciple and CumulInterest) as we go along.

Finally, the main SELECT statement pulls all the rows from the MonthlyPayments recursive CTE, dressing up the date by spelling out the month’s name and ORDERing BY the PmtNo.

Note the OPTION clause. By default, the maximum number of recursion levels is 100. The number of levels of recursion we will reach depends on the value of @NumberOfMonths that is passed to the procedure. If we’re doing a mortgage payment that spans 30 years, then @NumberOfMonths will be 360 and we will need a level of recursion of at least 360 or else we will get this message:

/*
Msg 530: The statement terminated. 
The maximum recursion 100 has been exhausted before statement completion.
*/

So our general MAXRECURSION value of 1000 should certainly be reasonable enough, assuming that no loan is going to last more than 83 years.

Great! We have a stored procedure that produces a loan schedule! Terrific!

Now what?

Not much, that’s what. This stored procedure works just fine, but we can’t really do anything with it. All we can do is EXECUTE it. Unless we INSERT its result set into a temporary table, we can’t query any important information out of it.

However, we could query information out of this loan schedule data if we created an inline table-valued function (TVF) instead of the stored procedure, like so:

use tempdb;
go
  
if object_id('ufn_LoanSchedule') is not null drop function ufn_LoanSchedule;
go
  
create function ufn_LoanSchedule
(
   @LoanAmount     numeric(10,2)
  ,@AnnualRate     numeric(10,8)  /* Expressed as percent */
  ,@NumberOfMonths int
  ,@StartDate      datetime
)
returns table
as
return
with InputVariables as
(
  select P,M,R
        ,C=round((P*R)/(1-power(1+R,-M)),2)
  from (select P=@LoanAmount
              ,M=@NumberOfMonths
              ,R=@AnnualRate/12/100) InputData
)
,MonthlyPayments(PmtNo
                ,PmtDate
                ,Balance
                ,Principle
                ,Interest
                ,CumulPrinciple
                ,CumulInterest) as
(
  select N
        ,dateadd(month,datediff(month,'19000101',@StartDate),'19000101')
        ,cast(NewBalance as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
  from InputVariables
  cross apply (select N=1) CalcPmtNo
  cross apply (select NewBalance=round(P*power(1+R,N)
                                      -(power(1+R,N)-1)*C/R,2)) CalcNewBalance
  union all
  select N
        ,dateadd(month,1,mp.PmtDate)
        ,cast(NewBalance as numeric(10,2))
        ,cast(mp.Balance-NewBalance as numeric(10,2))
        ,cast(C-(mp.Balance-NewBalance) as numeric(10,2))
        ,cast(mp.CumulPrinciple+mp.Balance-NewBalance as numeric(10,2))
        ,cast(mp.CumulInterest+C-(mp.Balance-NewBalance) as numeric(10,2))
  from MonthlyPayments mp
  cross join InputVariables
  cross apply (select N=mp.PmtNo+1) CalcPmtNo
  cross apply (select NewBalance=case
                                   when N=M
                                   then 0.00  /* Last Payment */
                                   else round(P*power(1+R,N)
                                             -(power(1+R,N)-1)*C/R,2)
                                 end) CalcNewBalance
  where N<=M
)
select PmtNo
      ,PmtDate
      ,PmtAmount=Principle+Interest
      ,Principle
      ,Interest
      ,CumulPrinciple
      ,CumulInterest
      ,Balance
from MonthlyPayments  
;

(This is why I originally created a single one-stop-shopping query in our original stored procedure without the use of local variables… so that I could easily translate it into an in-line TVF).

Now we can call the TVF just as if it were a table, pulling out any columns we wish:

select PmtNo
      ,PmtDate
      ,Principle
      ,Interest
      ,Balance
from dbo.ufn_LoanSchedule(20000,6.8,60,'20091201')
order by PmtNo
/*
PmtNo PmtDate                 Principle Interest  Balance
----- ----------------------- --------- -------- --------
    1 2009-12-01 00:00:00.000    280.81   113.33 19719.19
    2 2010-01-01 00:00:00.000    282.39   111.75 19436.80
    3 2010-02-01 00:00:00.000    284.00   110.14 19152.80
    4 2010-03-01 00:00:00.000    285.61   108.53 18867.19
  ...   ...                        ...      ...      ...
   57 2014-08-01 00:00:00.000    385.33     8.81  1169.11
   58 2014-09-01 00:00:00.000    387.52     6.62   781.59
   59 2014-10-01 00:00:00.000    389.71     4.43   391.88
   60 2014-11-01 00:00:00.000    391.88     2.26     0.00
*/

Look back at the code where we created the TVF. Did you notice something missing in the definition? Two things, actually. I deliberately left out the ORDER BY clause because we may want to order the data in a different way when we call the TVF. But the OPTION clause is also missing that specifies the MAXRECURSION value. T-SQL will not allow it in a query that’s part of a TVF definition. If we were to tack it on to the end of the query and try to CREATE the FUNCTION, we’d get the following error message:

/*
Msg 156, Level 15, State 1, Procedure ufn_LoanSchedule, Line 67
Incorrect syntax near the keyword 'option'.
*/

Since we cannot specify any MAXRECURSION in the TVF definition, it is (unfortunately) up to us to include the OPTION clause in any query that calls the TVF if we know that we are going to breach the minimum level of 100.

So, for example, let’s say we have just bought a quaint little cottage for $500,000. (Don’t scoff… that price is dirt cheap for a humble little abode here in the San Francisco Bay Area). We get a 30-year (360-month) fixed loan with a 5% interest rate, with our first payment due in January 2010. We need to add the OPTION clause to the following query to prevent an error:

select PmtNo
      ,PmtDate
      ,Principle
      ,Interest
      ,Balance
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
option (maxrecursion 360)
/*
PmtNo PmtDate                 Principle Interest   Balance
----- ----------------------- --------- -------- ---------
    1 2010-01-01 00:00:00.000    600.78  2083.33 499399.22
    2 2010-02-01 00:00:00.000    603.28  2080.83 498795.94
    3 2010-03-01 00:00:00.000    605.79  2078.32 498190.15
    4 2010-04-01 00:00:00.000    608.32  2075.79 497581.83
  ...  ...                         ...      ...       ...
  357 2039-09-01 00:00:00.000   2639.84    44.27   7984.14
  358 2039-10-01 00:00:00.000   2650.84    33.27   5333.30
  359 2039-11-01 00:00:00.000   2661.89    22.22   2671.41
  360 2039-12-01 00:00:00.000   2671.41    12.70      0.00
*/

Anyway, now that we have a TVF, we can get some interesting information from the loan schedule data.

For example, using our mortgage scenario, how much can we write off in mortgage interest on our income taxes in the year 2012?:

select TotalInterest=sum(Interest)
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where year(PmtDate)=2012
option (maxrecursion 360)
/*
TotalInterest
-------------
     24058.33
*/

What will our loan balance be at the end of 2015?:

select Balance
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where PmtDate='20151201'
order by PmtNo
option (maxrecursion 360)
/*
  Balance
---------
449676.39
*/

How long will it take for us to pay off half of the loan?:

select top 1 PmtNo,PmtDate
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where CumulPrinciple>=Balance
order by PmtNo
option (maxrecursion 360)
/*
PmtNo PmtDate
----- -----------------------
  242 2030-02-01 00:00:00.000
*/

(Yikes… That’s the problem with how these loans work. It’ll take over 20 years of the 30-year loan term to pay off half of the principle.)

Finally, we can test out different rates and see how the monthly payment changes and how much total cumulative interest we’ll end up paying for each rate:

select Rate
      ,PmtAmount
      ,CumulInterest
from (select 5.1 union all
      select 5.3 union all
      select 5.5 union all
      select 5.7 union all
      select 5.9) Rates(Rate)
cross apply dbo.ufn_LoanSchedule(500000,Rate,360,'20100101')
where PmtNo=360
option (maxrecursion 360)
/*
Rate PmtAmount CumulInterest
---- --------- -------------
 5.1   2714.75     477310.00
 5.3   2776.52     499547.20
 5.5   2838.95     522022.00
 5.7   2902.00     544720.00
 5.9   2965.68     567644.80
*/

(Don’t you find it ~~disgusting~~ interesting how you pay more in interest over the life of the loan than you do in principle if your rate is a little over 5.3%?).

I hope this article has been helpful not just in showing how to calculate loan schedule data, but also in demonstrating recursive CTE’s and TVF’s and other techniques. I certainly learned something I didn’t expect (the inability to specify a MAXRECURSION value in a TVF) in putting the article together.

I’ll bet that your borrowing future will be brighter now that you have the tools to analyze potential loans. From now on, you’ll be able to make those payments with a smile on your face.

Yeah, right.

Wednesday, November 11, 2009

Throw Your MONEY Away

This might not be a big revelation to many of you, but do you see any glaring similarities in the following datatypes?:

/*
DataType              Range (Smallest to Largest)                   #Bytes
--------------------------------------------------------------------------
integer                -2,147,483,648 to 2,147,483,647                   4
smallmoney              -214,748.3648 to 214,748.3647                    4
--------------------------------------------------------------------------
bigint     -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807       8
money       -922,337,203,685,477.5808 to 922,337,203,685,477.5807        8
--------------------------------------------------------------------------
*/

That’s right… the money datatypes are really just integers in disguise.

And, unfortunately, these money datatypes carry the baggage of integer division with them.

Consider the following:

declare @m money
set @m=500
select DivBy1600=@m/1600
      ,DivBy16000=@m/16000
      ,DivBy160000=@m/160000
      ,DivBy1600000=@m/1600000
      ,DivBy16000000=@m/16000000
/*
    DivBy1600    DivBy16000   DivBy160000  DivBy1600000 DivBy16000000
------------- ------------- ------------- ------------- -------------
       0.3125        0.0312        0.0031        0.0003          0.00
*/

You can see that when 500 is divided by 16,000, the result is truncated (not rounded) to 4 decimals… the result is 0.0312 and not 0.0313.

And that truncation of 4 decimal places can produce some unexpected results:

declare @m money
set @m=500
select By1600=@m/1600*1600
      ,By16000=@m/16000*16000
      ,By160000=@m/160000*160000
      ,By1600000=@m/1600000*1600000
      ,By16000000=@m/16000000*16000000
/*
      By1600      By16000     By160000    By1600000   By16000000
------------ ------------ ------------ ------------ ------------
      500.00       499.20       496.00       480.00         0.00
*/

One would expect to come up with an answer of 500 for each of those… But nooooo.

Notice that the same phenomenon does not happen when doing the same operations on a decimal (or numeric) datatype:

declare @d decimal(5,2)
set @d=500
select DivBy1600=@d/1600
      ,DivBy16000=@d/16000
      ,DivBy160000=@d/160000
      ,DivBy1600000=@d/1600000
      ,DivBy16000000=@d/16000000
/*
    DivBy1600    DivBy16000   DivBy160000  DivBy1600000 DivBy16000000
------------- ------------- ------------- ------------- -------------
    0.3125000    0.03125000   0.003125000  0.0003125000 0.00003125000
*/
select By1600=@d/1600*1600
      ,By16000=@d/16000*16000
      ,By160000=@d/160000*160000
      ,By1600000=@d/1600000*1600000
      ,By16000000=@d/16000000*16000000
/*
           By1600          By16000        By160000       By1600000      By16000000
----------------- ---------------- --------------- --------------- ---------------
      500.0000000     500.00000000   500.000000000  500.0000000000 500.00000000000
*/

Here’s another money tidbit that may cause confusion among new users of T-SQL…

According to Books Online, you represent a money constant as a string of numbers with an optional currency symbol as a prefix. So you can represent a money value of 100 as $100 or £100 or ¥100 or €100, for example. But of course this doesn’t take into account any conversion ratios, as the following demonstrates:

select Revelation=
         case 
           when $100=£100 and $100=¥100 and $100=€100 
           then 'Really?  Dollars and Pounds and Yen and Euros are all equal??'
           else 'Of course they''re different... Who are you kidding?'
         end
/*
Revelation
-------------------------------------------------------------
Really?  Dollars and Pounds and Yen and Euros are all equal??
*/

There is one interesting thing (though laughably trivial) about using the money datatype. T-SQL will allow you to convert strings with commas into money, but it will not allow you to convert them into decimal or numeric:

declare @m money, @d decimal(10,2)
set @m='1,234,567.89'  /* This works fine */
set @d='1,234,567.89'  /* Msg 8114: Error converting data type varchar to numeric */

Whoop-de-doo.

In reality, the commas are simply removed before the conversion… the actual quantity or placement of the commas is not validated in any way. For example, the following executes without error:

declare @m money
set @m=',1,,2,3,,,4,5,6,7,,,.8,9,,'
select @m  /* Returns 1234567.89 */

I’ve personally never used the money datatype, and I certainly never will. As the biblical saying goes: “For the love of money is the root of all inaccurate calculations and currency confusion… oh yeah, and evil too.”

Friday, November 6, 2009

XML PATHs Of Glory

In a past blog post, I illustrated how you can use the FOR XML PATH clause to create comma-separated lists of items. In this entry, I’ll go into detail as to how FOR XML PATH can be used for what it was designed for: to shape actual XML output. Finally, I’ll use FOR XML PATH to create some HTML output as well.

The PATH option was introduced in SQL2005 to provide a flexible and easier approach to constructing XML output. I thank my lucky stars that I started with T-SQL at the SQL2005 level, because the SQL2000 method of using the EXPLICIT option looks like a complete nightmare. (If you’re into torture, take a look at Books Online for documentation on how to use the EXPLICIT option. When you're done screaming, then come back and read on).

Let's take a quick look at the output that results with the FOR XML PATH clause. If you pass no specific path name, then it assumes a path of ‘row’:

select ID=ContactID
      ,FirstName
      ,LastName
      ,Phone
from Person.Contact
where ContactID between 90 and 94
for xml path
/*
<row>
  <ID>90</ID>
  <FirstName>Andreas</FirstName>
  <LastName>Berglund</LastName>
  <Phone>795-555-0116</Phone>
</row>
<row>
  <ID>91</ID>
  <FirstName>Robert</FirstName>
  <LastName>Bernacchi</LastName>
  <Phone>449-555-0176</Phone>
</row>
<row>
  <ID>92</ID>
  <FirstName>Matthias</FirstName>
  <LastName>Berndt</LastName>
  <Phone>384-555-0169</Phone>
</row>
<row>
  <ID>93</ID>
  <FirstName>John</FirstName>
  <LastName>Berry</LastName>
  <Phone>471-555-0181</Phone>
</row>
<row>
  <ID>94</ID>
  <FirstName>Steven</FirstName>
  <LastName>Brown</LastName>
  <Phone>280-555-0124</Phone>
</row>
*/

For the query below, let's supply a specific path name of ‘Contact’. And it’s usually good practice to create XML with a root tag, and we can do that by adding the ROOT directive like so:

select ID=ContactID
      ,FirstName
      ,LastName
      ,Phone
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact>
    <ID>90</ID>
    <FirstName>Andreas</FirstName>
    <LastName>Berglund</LastName>
    <Phone>795-555-0116</Phone>
  </Contact>
  <Contact>
    <ID>91</ID>
    <FirstName>Robert</FirstName>
    <LastName>Bernacchi</LastName>
    <Phone>449-555-0176</Phone>
  </Contact>
  <Contact>
    <ID>92</ID>
    <FirstName>Matthias</FirstName>
    <LastName>Berndt</LastName>
    <Phone>384-555-0169</Phone>
  </Contact>
  <Contact>
    <ID>93</ID>
    <FirstName>John</FirstName>
    <LastName>Berry</LastName>
    <Phone>471-555-0181</Phone>
  </Contact>
  <Contact>
    <ID>94</ID>
    <FirstName>Steven</FirstName>
    <LastName>Brown</LastName>
    <Phone>280-555-0124</Phone>
  </Contact>
</Contacts>
*/

You’ll note that the column names were used as the tags for each element in the XML. For example, I renamed the first column to be ID rather than ContactID and therefore the element tag <ID></ID> was created.

You have the ability to shape the XML in whatever ways you wish based on what names you give to your columns. For example, any column that starts with an at-sign (@) will create attributes rather than elements, as illustrated below:

select "@ID"=ContactID
      ,"@FirstName"=FirstName
      ,"@LastName"=LastName
      ,"@Phone"=Phone
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact ID="90" FirstName="Andreas" LastName="Berglund" Phone="795-555-0116" />
  <Contact ID="91" FirstName="Robert" LastName="Bernacchi" Phone="449-555-0176" />
  <Contact ID="92" FirstName="Matthias" LastName="Berndt" Phone="384-555-0169" />
  <Contact ID="93" FirstName="John" LastName="Berry" Phone="471-555-0181" />
  <Contact ID="94" FirstName="Steven" LastName="Brown" Phone="280-555-0124" />
</Contacts>
*/

You can mix attributes and elements together like so:

select "@ID"=ContactID
      ,FirstName
      ,LastName
      ,Phone
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact ID="90">
    <FirstName>Andreas</FirstName>
    <LastName>Berglund</LastName>
    <Phone>795-555-0116</Phone>
  </Contact>
  <Contact ID="91">
    <FirstName>Robert</FirstName>
    <LastName>Bernacchi</LastName>
    <Phone>449-555-0176</Phone>
  </Contact>
  <Contact ID="92">
    <FirstName>Matthias</FirstName>
    <LastName>Berndt</LastName>
    <Phone>384-555-0169</Phone>
  </Contact>
  <Contact ID="93">
    <FirstName>John</FirstName>
    <LastName>Berry</LastName>
    <Phone>471-555-0181</Phone>
  </Contact>
  <Contact ID="94">
    <FirstName>Steven</FirstName>
    <LastName>Brown</LastName>
    <Phone>280-555-0124</Phone>
  </Contact>
</Contacts>
*/

And you can create nested attributes and elements, as illustrated below:

select "@ID"=ContactID
      ,"Name/@Title"=Title
      ,"Name/@Suffix"=Suffix
      ,"Name/First"=FirstName
      ,"Name/Last"=LastName
from Person.Contact
where ContactID between 92 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact ID="92">
    <Name Title="Mr.">
      <First>Matthias</First>
      <Last>Berndt</Last>
    </Name>
    <Phone>384-555-0169</Phone>
  </Contact>
  <Contact ID="93">
    <Name>
      <First>John</First>
      <Last>Berry</Last>
    </Name>
    <Phone>471-555-0181</Phone>
  </Contact>
  <Contact ID="94">
    <Name Title="Mr." Suffix="IV">
      <First>Steven</First>
      <Last>Brown</Last>
    </Name>
    <Phone>280-555-0124</Phone>
  </Contact>
</Contacts>
*/

In the above query, I introduced a Name element with two attributes (Title and Suffix) and two sub-elements (First and Last). You can also see that some of the contacts had NULL for the Title and Suffix and therefore those attributes were not created for those contacts.

Note that attributes must be introduced first, before the elements. For example, if I tried to do the following, I would get an error:

select "@ID"=ContactID
      ,"Name/First"=FirstName
      ,"Name/Last"=LastName
      ,"Name/@Title"=Title
      ,"Name/@Suffix"=Suffix
      ,Phone
from Person.Contact
where ContactID between 92 and 94
for xml path('Contact'),root('Contacts')
/*
Msg 6852, Level 16, State 1, Line 1
Attribute-centric column 'Name/@Title' must not come after a 
non-attribute-centric sibling in XML hierarchy in FOR XML PATH.
*/

When you have two adjacent columns with the same name, then their data will be concatenated together in one element, like so:

select Name=Title
      ,Name=FirstName
      ,Name=MiddleName
      ,Name=LastName
      ,Name=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact><Name>AndreasBerglund</Name></Contact>
  <Contact><Name>Mr.RobertM.Bernacchi</Name></Contact>
  <Contact><Name>Mr.MatthiasBerndt</Name></Contact>
  <Contact><Name>JohnBerry</Name></Contact>
  <Contact><Name>Mr.StevenB.BrownIV</Name></Contact>
</Contacts>
*/

Note again that NULL column values are ignored in the concatenation.

If you wanted to construct a nice readable single element consisting of the contact’s full name (Title, FirstName, MiddleName, LastName, and Suffix), you could approach it like this:

select Name=coalesce(Title+' ','')
           +FirstName+' '
           +coalesce(MiddleName+' ','')
           +LastName
           +coalesce(' '+Suffix,'')
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact><Name>Andreas Berglund</Name></Contact>
  <Contact><Name>Mr. Robert M. Bernacchi</Name></Contact>
  <Contact><Name>Mr. Matthias Berndt</Name></Contact>
  <Contact><Name>John Berry</Name></Contact>
  <Contact><Name>Mr. Steven B. Brown IV</Name></Contact>
</Contacts>
*/

But look all the logic required to handle possible NULL values in the Title and MiddleName and Suffix columns. Well, good news! You can use the following trick. Incorporate data() into the column name as illustrated below, and it will take care of concatenating it all together with spaces between and eliminating all the NULL values automatically:

select "Name/data()"=Title
      ,"Name/data()"=FirstName
      ,"Name/data()"=MiddleName
      ,"Name/data()"=LastName
      ,"Name/data()"=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact><Name>Andreas Berglund</Name></Contact>
  <Contact><Name>Mr. Robert M. Bernacchi</Name></Contact>
  <Contact><Name>Mr. Matthias Berndt</Name></Contact>
  <Contact><Name>John Berry</Name></Contact>
  <Contact><Name>Mr. Steven B. Brown IV</Name></Contact>
</Contacts>
*/

However, this approach will not work if you were trying to construct a Name attribute as opposed to a Name element:

select "@Name/data()"=Title
      ,"@Name/data()"=FirstName
      ,"@Name/data()"=MiddleName
      ,"@Name/data()"=LastName
      ,"@Name/data()"=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
Msg 6850, Level 16, State 1, Line 1
Column name '@Name/data()' contains an invalid XML identifier as required by FOR XML; 
'@'(0x0040) is the first character at fault.
*/

But you can handle that through a sub-query like so:

select "@Name"=(select "data()"=Title
                      ,"data()"=FirstName
                      ,"data()"=MiddleName
                      ,"data()"=LastName
                      ,"data()"=Suffix
                from Person.Contact c2
                where c2.ContactID=Contact.ContactID
                for xml path(''))
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact Name="Andreas Berglund" />
  <Contact Name="Mr. Robert M. Bernacchi" />
  <Contact Name="Mr. Matthias Berndt" />
  <Contact Name="John Berry" />
  <Contact Name="Mr. Steven B. Brown IV" />
</Contacts>
*/

Besides data(), you can also incorporate text() or node() into a column name or give a column a wildcard name (*) and the data will be inserted directly as text. They are all interchangeable, as you can see in the following example:

select "text()"=Title
      ,"node()"=FirstName
      ,"*"=MiddleName
      ,"node()"=LastName
      ,"text()"=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact>AndreasBerglund</Contact>
  <Contact>Mr.RobertM.Bernacchi</Contact>
  <Contact>Mr.MatthiasBerndt</Contact>
  <Contact>JohnBerry</Contact>
  <Contact>Mr.StevenB.BrownIV</Contact>
</Contacts>
*/

Remember, two adjacent columns with names that incorporate data() will be separated by a space, but, as you see above, those named with text() or node() or a wildcard are just concatenated directly with no intervening space.

You only really need to specify text() or node() or wildcard names if you want to insert a text element directly subordinate to the main path element, as we saw in the previous query. If, on the other hand, you are inserting text in a sub-element like so…:

select "Name/text()"=Title
      ,"Name/node()"=FirstName
      ,"Name/*"=MiddleName
      ,"Name/node()"=LastName
      ,"Name/text()"=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact><Name>AndreasBerglund</Name></Contact>
  <Contact><Name>Mr.RobertM.Bernacchi</Name></Contact>
  <Contact><Name>Mr.MatthiasBerndt</Name></Contact>
  <Contact><Name>JohnBerry</Name></Contact>
  <Contact><Name>Mr.StevenB.BrownIV</Name></Contact>
</Contacts>
*/

…then you’ll see that they are really unnecessary, since the following query (which we looked at earlier) does the exact same thing:

select Name=Title
      ,Name=FirstName
      ,Name=MiddleName
      ,Name=LastName
      ,Name=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact><Name>AndreasBerglund</Name></Contact>
  <Contact><Name>Mr.RobertM.Bernacchi</Name></Contact>
  <Contact><Name>Mr.MatthiasBerndt</Name></Contact>
  <Contact><Name>JohnBerry</Name></Contact>
  <Contact><Name>Mr.StevenB.BrownIV</Name></Contact>
</Contacts>
*/

You can also incorporate comment() or processing-instruction() into the column names to create those kinds of elements, as illustrated below:

select "@ID"=ContactID
      ,"comment()"='Modified on '+convert(varchar(30),ModifiedDate,126)
      ,"comment()"=case when ContactID=92 then 'Here is Contact#92' end
      ,"processing-instruction(EmailPromo)"=EmailPromotion
      ,"Name/@First"=FirstName
      ,"Name/@Last"=LastName
      ,"Name"='This is inserted directly as text'
      ,"Name"='...And so is this'
      ,"*"='This is inserted as text in the main Contact path'
from Person.Contact      
where ContactID between 90 and 92
for xml path('Contact'),root('Contacts')
/*
<Contacts>
  <Contact ID="90">
    <!--Modified on 2001-08-01T00:00:00-->
    <?EmailPromo 0?>
    <Name First="Andreas" Last="Berglund">
      This is inserted directly as text...And so is this
    </Name>
    This is inserted as text in the main Contact path
  </Contact>
  <Contact ID="91">
    <!--Modified on 2002-09-01T00:00:00-->
    <?EmailPromo 1?>
    <Name First="Robert" Last="Bernacchi">
      This is inserted directly as text...And so is this
    </Name>
    This is inserted as text in the main Contact path
  </Contact>
  <Contact ID="92">
    <!--Modified on 2002-08-01T00:00:00-->
    <!--Here is Contact#92-->
    <?EmailPromo 1?>
    <Name First="Matthias" Last="Berndt">
      This is inserted directly as text...And so is this
    </Name>
    This is inserted as text in the main Contact path
  </Contact>
</Contacts>
*/

You’ll note above that the two adjacent columns named comment() do NOT concatenate together like other adjacent columns with the same name. They are always separate elements. The same is true for processing-instruction() columns.

You can also concatenate whole individual XML documents together, as illustrated below, where we use two scalar subqueries to construct XML data from the Sales.SalesPerson and Sales.SalesReason tables. Since we did not give actual column names to the two subqueries, they are inserted directly as is. (Note that we could have named each of them node() or a wildcard and it would have worked the same. However, it’s important to note that you may NOT use text() or data() in naming true XML datatype columns):

select (select "@ID"=SalesPersonID
              ,"@Quota"=SalesQuota
        from Sales.SalesPerson
        for xml path('Person'),root('SalesPeople'),type)
      ,(select "@ID"=SalesReasonID
              ,"@Name"=Name
              ,"@Type"=ReasonType
        from Sales.SalesReason
        for xml path('Reason'),root('SalesReasons'),type)
for xml path('MyData')
/*
<MyData>
  <SalesPeople>
    <Person ID="268" />
    <Person ID="275" Quota="300000.0000" />
    <Person ID="276" Quota="250000.0000" />
    <Person ID="277" Quota="250000.0000" />
    <Person ID="278" Quota="250000.0000" />
    <Person ID="279" Quota="300000.0000" />
    <Person ID="280" Quota="250000.0000" />
    <Person ID="281" Quota="250000.0000" />
    <Person ID="282" Quota="250000.0000" />
    <Person ID="283" Quota="250000.0000" />
    <Person ID="284" />
    <Person ID="285" Quota="250000.0000" />
    <Person ID="286" Quota="250000.0000" />
    <Person ID="287" Quota="300000.0000" />
    <Person ID="288" />
    <Person ID="289" Quota="250000.0000" />
    <Person ID="290" Quota="250000.0000" />
  </SalesPeople>
  <SalesReasons>
    <Reason ID="1" Name="Price" Type="Other" />
    <Reason ID="2" Name="On Promotion" Type="Promotion" />
    <Reason ID="3" Name="Magazine Advertisement" Type="Marketing" />
    <Reason ID="4" Name="Television  Advertisement" Type="Marketing" />
    <Reason ID="5" Name="Manufacturer" Type="Other" />
    <Reason ID="6" Name="Review" Type="Other" />
    <Reason ID="7" Name="Demo Event" Type="Marketing" />
    <Reason ID="8" Name="Sponsorship" Type="Marketing" />
    <Reason ID="9" Name="Quality" Type="Other" />
    <Reason ID="10" Name="Other" Type="Other" />
  </SalesReasons>
</MyData>
*/

Note that the ,TYPE directive was used to make sure that the XML subqueries came through as true XML datatypes. This is very important. If we had left off the ,TYPE directive, they would be processed as strings and then when they were incorporated into the main query, the main FOR XML PATH(‘MyData’) would encode all of the less-than and greater-than signs into this ugly mess:

select (select "@ID"=SalesPersonID
              ,"@Quota"=SalesQuota
        from Sales.SalesPerson
        for xml path('Person'),root('SalesPeople'))
      ,(select "@ID"=SalesReasonID
              ,"@Name"=Name
              ,"@Type"=ReasonType
        from Sales.SalesReason
        for xml path('Reason'),root('SalesReasons'))
for xml path('MyData')
/*
<MyData>
  &lt;SalesPeople&gt;
    &lt;Person ID="268" /&gt;
    &lt;Person ID="275" Quota="300000.0000" /&gt;
    &lt;Person ID="276" Quota="250000.0000" /&gt;
    &lt;Person ID="277" Quota="250000.0000" /&gt;
    &lt;Person ID="278" Quota="250000.0000" /&gt;
    &lt;Person ID="279" Quota="300000.0000" /&gt;
    &lt;Person ID="280" Quota="250000.0000" /&gt;
    &lt;Person ID="281" Quota="250000.0000" /&gt;
    &lt;Person ID="282" Quota="250000.0000" /&gt;
    &lt;Person ID="283" Quota="250000.0000" /&gt;
    &lt;Person ID="284" /&gt;
    &lt;Person ID="285" Quota="250000.0000" /&gt;
    &lt;Person ID="286" Quota="250000.0000" /&gt;
    &lt;Person ID="287" Quota="300000.0000" /&gt;
    &lt;Person ID="288" /&gt;
    &lt;Person ID="289" Quota="250000.0000" /&gt;
    &lt;Person ID="290" Quota="250000.0000" /&gt;
  &lt;/SalesPeople&gt;
  &lt;SalesReasons&gt;
    &lt;Reason ID="1" Name="Price" Type="Other" /&gt;
    &lt;Reason ID="2" Name="On Promotion" Type="Promotion" /&gt;
    &lt;Reason ID="3" Name="Magazine Advertisement" Type="Marketing" /&gt;
    &lt;Reason ID="4" Name="Television  Advertisement" Type="Marketing" /&gt;
    &lt;Reason ID="5" Name="Manufacturer" Type="Other" /&gt;
    &lt;Reason ID="6" Name="Review" Type="Other" /&gt;
    &lt;Reason ID="7" Name="Demo Event" Type="Marketing" /&gt;
    &lt;Reason ID="8" Name="Sponsorship" Type="Marketing" /&gt;
    &lt;Reason ID="9" Name="Quality" Type="Other" /&gt;
    &lt;Reason ID="10" Name="Other" Type="Other" /&gt;
  &lt;/SalesReasons&gt;
</MyData>
*/

Now that we’ve learned so much about FOR XML PATH, let’s put our knowledge to use. Let’s say that you want to construct a webpage or an e-mail that incorporates a table in HTML format. Using our knowledge of FOR XML PATH, we will construct all the HTML between the <table></table> tags. That can then be incorporated into the correct spot in the webpage or e-mail.

Note that this query below uses most of what we learned in this article. You’ll see the following:

We create ALIGN and VALIGN attributes to align the table headers correctly.
We put a <br /> tag into the Phone Number header to split it into two lines.
We use data() to construct the Full Name of the contact.
We create a hyperlink for the E-Mail Address
We subtly color the E-Mail Address in a pale yellow color if EmailPromotion is equal to 1.
We use the ,TYPE directive in our XML CTEs so that we can concatenate them in subsequent CTEs.

Here’s the query, which creates a single NVARCHAR(MAX) variable called @TableHTML:

declare @TableHTML nvarchar(max);
  
with HTMLTableHeader(HTMLContent) as
(
  select "th/@align"='right'
        ,"th/@valign"='bottom'
        ,"th"='ContactID'
        ,"*"=''
        ,"th/@valign"='bottom'
        ,"th"='Full Name'
        ,"*"=''
        ,"th"='Phone'
        ,"th/br"=''
        ,"th"='Number'
        ,"*"=''
        ,"th/@valign"='bottom'
        ,"th"='Email Address'
  for xml path('tr'),type
)
,HTMLTableDetail(HTMLContent) as
(
  select "td/@align"='right'
        ,"td"=ContactID
        ,"*"=''
        ,"td/data()"=Title
        ,"td/data()"=FirstName
        ,"td/data()"=MiddleName
        ,"td/data()"=LastName
        ,"td/data()"=Suffix
        ,"*"=''
        ,"td"=Phone
        ,"*"=''
        ,"td/@bgcolor"=case when EmailPromotion=1 then '#FFFF88' end
        ,"td/a/@href"='mailto:'+EmailAddress
        ,"td/a"=EmailAddress
  from Person.Contact
  where ContactID between 90 and 94
  for xml path('tr'),type
)
,HTMLTable(HTMLContent) as
(
  select "@border"=1
        ,(select HTMLContent from HTMLTableHeader)
        ,(select HTMLContent from HTMLTableDetail)
  for xml path('table')  /*No TYPE because we want a string */
)
select @TableHTML=(select HTMLContent from HTMLTable);

Remember the rule that if two adjacent columns have the same name, their data will concatenated? I had to prevent that from happening with adjacent columns that I named th and td by inserting a blank column with a wildcard name between them to force them to come out as discrete elements.

And here are the contents of that variable as a result of that query:

/*
<table border="1">
  <tr>
    <th align="right" valign="bottom">ContactID</th>
    <th valign="bottom">Full Name</th>
    <th>Phone<br />Number</th>
    <th valign="bottom">Email Address</th>
  </tr>
  <tr>
    <td align="right">90</td>
    <td>Andreas Berglund</td>
    <td>795-555-0116</td>
    <td>
      <a href="mailto:andreas1@adventure-works.com">andreas1@adventure-works.com</a>
    </td>
  </tr>
  <tr>
    <td align="right">91</td>
    <td>Mr. Robert M. Bernacchi</td>
    <td>449-555-0176</td>
    <td bgcolor="#FFFF88">
      <a href="mailto:robert4@adventure-works.com">robert4@adventure-works.com</a>
    </td>
  </tr>
  <tr>
    <td align="right">92</td>
    <td>Mr. Matthias Berndt</td>
    <td>384-555-0169</td>
    <td bgcolor="#FFFF88">
      <a href="mailto:matthias1@adventure-works.com">matthias1@adventure-works.com</a>
    </td>
  </tr>
  <tr>
    <td align="right">93</td>
    <td>John Berry</td>
    <td>471-555-0181</td>
    <td>
      <a href="mailto:john11@adventure-works.com">john11@adventure-works.com</a>
    </td>
  </tr>
  <tr>
    <td align="right">94</td>
    <td>Mr. Steven B. Brown IV</td>
    <td>280-555-0124</td>
    <td>
      <a href="mailto:steven1@adventure-works.com">steven1@adventure-works.com</a>
    </td>
  </tr>
</table>
*/

Our webpage template looks like this, with a placeholder where we want to insert our table:

/*
<html>
  <head>
    <title>HTML Table constructed via FOR XML PATH</title>
  </head>
  <body style="font-family:Arial; font-size:small">
    <span style="font-size:x-large">
      <b>Selected Contacts:</b>
    </span>
    <!-- Insert Table Here -->
  </body>
</html>
*/

And here is the final result, with the table data inserted in the placeholder position, when we look at the web page in Internet Explorer:

I hope this article gave you a tantalizing look at the possibilities of things you can accomplish with the FOR XML PATH clause. In future blog entries, I’ll explore some other aspects of XML.

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Friday, November 20, 2009

A Loan At Last!

Wednesday, November 11, 2009

Throw Your MONEY Away

Friday, November 6, 2009

XML PATHs Of Glory

Search This Blog

Subscribe

About Me

Keywords

Blog Archive

Followers

# # #

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Friday, November 20, 2009

A Loan At Last!

Wednesday, November 11, 2009

Throw Your MONEY Away

Friday, November 6, 2009

XML PATHs Of Glory

Search This Blog

Subscribe

Subscribe To...

About Me

Keywords

Blog Archive

Followers

# # #