Tuesday, May 10, 2011

CTE: Coolest T-SQL Enhancement

T-SQL TuesdayThis blog entry is participating in T-SQL Tuesday #018, hosted this month by Bob Pusateri.

You are invited to visit his blog and join the party and read more blogs participating in this month’s theme: CTEs.

CTE stands for Common Table Expression, but it should stand for Coolest T-SQL Enhancement. In fact, this T-SQL Tuesday topic is a perfect followup to last month’s topic of APPLY, because APPLY and CTEs (and Window Functions) were all added in SQL2005 as new indispensable enhancements to the language.

The APPLY Operator is an incredibly versatile tool that helps you to create “functions on-the-fly” and do column manipulations. Similarly, a CTE is a tool that helps you create “views on-the-fly” and do row manipulations.

There are many things that you can do with CTEs that you could do with APPLY as well; however, what sets CTEs apart are that they can be recursive and that they can be re-used (in the same query). You can’t re-use an APPLY operation.

I’m sure there will be many posts about recursion this month, but I did a big treatise on that subject last year, so I’m going to focus on a couple of other cool features of CTEs.

Column Aliasing

One aspect of CTEs that are not used that often (and that surprisingly not many people even know about) is the fact that you can provide column aliases in its definition.

For example, let’s say you have a simple CTE query like the following using the NorthWind database (I’m taking a break from AdventureWorks):

with MadridCusts as
(
select ID=CustomerID
,Company=CompanyName
,Contact=ContactName
,Phone
from Customers
where City='Madrid'
)
select ID,Company,Contact,Phone
from MadridCusts
/*
ID Company Contact Phone
----- ------------------------------------ ---------------- --------------
BOLID Bolido Comidas preparadas Martin Sommer (91) 555 22 82
FISSA FISSA Fabrica Inter. Salchichas S.A. Diego Roel (91) 555 94 44
ROMEY Romero y tomillo Alejandra Camino (91) 745 6200
*/
As you can see, I’m defining new column aliases for each of the columns (except Phone) within the CTE. But in the WITH clause, I can override those columns aliases with different aliases if I want to. (In fact, you didn’t hear it here, but this is a hilarious trick you can play on your colleagues when they move away from their desk for a few moments):

with MadridCusts(Phone,ID,Company,Contact) as
(
select ID=CustomerID
,Company=CompanyName
,Contact=ContactName
,Phone
from Customers
where City='Madrid'
)
select ID,Company,Contact,Phone
from MadridCusts
/*
ID Company Contact Phone
------------------------------------ ---------------- --------------- -----
Bolido Comidas preparadas Martin Sommer (91) 555 22 82 BOLID
FISSA Fabrica Inter. Salchichas S.A. Diego Roel (91) 555 94 44 FISSA
Romero y tomillo Alejandra Camino (91) 745 6200 ROMEY
*/
Diabolical, isn’t it?

Multiple CTEs Building Upon Each Other

You can define several CTEs that build upon each other. In fact, last year, I wrote a blog post about a query that plays Poker. Via about a dozen CTEs, it creates the deck of cards, deals them out to 10 people, evaluates the hands, and shows the winner… all in one single query.

The following is a ridiculously simple (and useless) example of CTEs building upon each other:

with USACustomers as
(
select CustomerID,CompanyName,ContactName,Region,City,Phone,PostalCode
from Customers
where Country='USA'
)
,
USACustomersInOregon as
(
select *
from USACustomers
where Region='OR'
)
,
OregonCustomersInPortland as
(
select *
from USACustomersInOregon
where City='Portland'
)
select * from OregonCustomersInPortland
Each CTE uses the result of the previous to continue to narrow down the result set. But since each CTE is treated as a view, the query optimizer is able to “push” the predicates into a single one. It is exactly the same in every aspect as the following query:

select CustomerID,CompanyName,ContactName,Region,City,Phone,PostalCode
from Customers
where Country='USA'
and Region='OR'
and City='Portland'

Updating and Deleting

Another aspect of CTEs that many people don’t realize is that they are updatable.

Let’s make a temporary copy of the Customers table and Orders table add a new column to them called UpdateColumn:

if object_id('tempdb..#Custs','U') is not null drop table #Custs
select *, cast(null as char(1)) as UpdateColumn
into #Custs
from Customers
if object_id('tempdb..#Orders','U') is not null drop table #Orders
select *, cast(null as char(1)) as UpdateColumn
into #Orders
from Orders
Now we can do our same OregonCustomersInPortland CTE query, but this time we will update that new UpdateColumn with an ‘X’ for each of those Portland Customers. Note that we have to introduce the UpdateColumn to the SELECT list of the initial query:

with USACustomers as
(
select CustomerID,CompanyName,ContactName,Region,City,Phone,PostalCode,UpdateColumn
from #Custs
where Country='USA'
)
,
USACustomersInOregon as
(
select *
from USACustomers
where Region='OR'
)
,
OregonCustomersInPortland as
(
select *
from USACustomersInOregon
where City='Portland'
)
update OregonCustomersInPortland
set UpdateColumn='X'
/*
(2 row(s) affected)
*/
As long as the column(s) you want to update are defined in the CTE, you can update it. In fact, you can introduce the UpdateColumn multiple times within the CTEs and you can update any one of them and it will work fine. (Note that you can’t update more than one of them, because SQL will not allow you to update a column more than once:

with USACustomers as
(
select CustomerID,CompanyName,ContactName,Region,City,Phone,PostalCode,UpdateColumn
from #Custs
where Country='USA'
)
,
USACustomersInOregon as
(
select *,UpdateColumnAgain=UpdateColumn
from USACustomers
where Region='OR'
)
,
OregonCustomersInPortland as
(
select *,UpdateColumnStillAgain=UpdateColumn
from USACustomersInOregon
where City='Portland'
)
update OregonCustomersInPortland
set UpdateColumnStillAgain='X'
/*
(2 row(s) affected)
*/
Using CTEs to UPDATE or DELETE can perhaps make things a little clearer when you’re JOINing in other tables or involving more complicated query methods. For example, let’s say Howard Snyder, a contact for some Customer in Northwind, calls us and wants to change his last open order so that it ships via Speedy Express rather than the usual Federal Shipping.

So we can use a CTE to find open orders belonging to a customer whose contact is named Howard Snyder that are set up for Federal Shipping, get the most recent one (based on Order Date) and update its Shipping Method to Speedy Express instead:

with HowardSnyderOpenOrdersViaFedShipping as
(
select o.ShipVia
,RowNum=row_number() over (order by o.OrderDate desc)
from #Orders o
join Customers c on o.CustomerID=c.CustomerID
join Shippers s on o.ShipVia=s.ShipperID
where o.ShippedDate is null --Open Orders
and c.ContactName='Howard Snyder'
and s.CompanyName='Federal Shipping'
)
update HowardSnyderOpenOrdersViaFedShipping
set ShipVia=(select ShipperID
from Shippers
where CompanyName='Speedy Express')
where RowNum=1 --Only most recent order
/*
(1 row(s) affected)
*/
Coo-ul, huh?

Generating Numbers

This has been seen in hundreds of blogs and books, but I still marvel at the following method of generating a table of numbers (from 1 to 1,000,000). It’s elegant and brief and fast:

with 
L0
(c) as (select 0 from (values (0),(0),(0)) f(c)) --3 Rows
,L1(c) as (select 0 from L0 a,L0 b,L0 c) --27 Rows (3x3x3)
,L2(c) as (select 0 from L1 a,L1 b,L1 c) --19683 Rows (27x27x27)
,L3(c) as (select 0 from L2 a,L2 b) --387,420,489 Rows (19683x19683)
,NN(n) as (select row_number() over (order by (select 0)) from L3)
select n into #Nums from NN where n<=1000000
That’s just too cool. Note that because of the VALUES row constructor syntax, the above will only work in SQL2008. To make it work in SQL2005 or earlier, just change the first CTE to use UNION ALLs instead:

with 
L0
(c) as (select 0 union all select 0 union all select 0) --3 Rows

Step-By-Step Clarity

Like the APPLY operator, CTEs are terrific for step-by-step self-documentation. They make your code easier to follow and understand what’s going on.

Here’s an example of seeing whether the current date/time falls within the Daylight Savings Time (as defined by the United States, unless you live in Hawaii or Arizona). Again, this will only work in SQL2008… Change the VALUES row constructor syntax to UNION ALL syntax to make it work in SQL2005:

with 
ZeroThruSix
as
(
select N from (values (0),(1),(2),(3),(4),(5),(6)) F(N)
)
,FirstDayInMarch(FirstDayInMarch) as
(
select convert(datetime,str(year(getdate()),4)+'0301 02:00')
)
,FirstSevenDaysInMarch(MarchDate) as
(
select dateadd(day,N,FirstDayInMarch)
from FirstDayInMarch
cross join ZeroThruSix
)
,SecondSundayInMarch(SecondSundayInMarch) as
(
select dateadd(day,7,MarchDate)
from FirstSevenDaysInMarch
where datename(weekday,MarchDate)='Sunday'
)
,FirstDayInNovember(FirstDayInNovember) as
(
select convert(datetime,str(year(getdate()),4)+'1101 02:00')
)
,FirstSevenDaysInNovember(NovemberDate) as
(
select dateadd(day,N,FirstDayInNovember)
from FirstDayInNovember
cross join ZeroThruSix
)
,FirstSundayInNovember(FirstSundayInNovember) as
(
select NovemberDate
from FirstSevenDaysInNovember
where datename(weekday,NovemberDate)='Sunday'
)
select IsDST=convert(bit,case
when getdate()>=SecondSundayInMarch
and getdate()<FirstSundayInNovember
then 1
else 0
end)
from SecondSundayInMarch
cross join FirstSundayInNovember
Cute, huh? Note how the ZeroThruSix CTE was used more than once? That’s the reusability I was mentioning earlier. You can also see how I incorporated column aliasing in most of the CTEs as well.

You can follow what’s going on step-by-step in the above query. Before SQL2005 and CTEs you would have to write the query like so:

select IsDST=convert(bit,case
when getdate()>=SecondSundayInMarch
and getdate()<FirstSundayInNovember
then 1
else 0
end)
from
(
select dateadd(day,N+7,convert(datetime,str(year(getdate()),4)+'0301 02:00'))
from (select 0 union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6) F(N)
where datename(weekday
,dateadd(day,N,convert(datetime,str(year(getdate()),4)+'0301 02:00')))
='Sunday'
) SecondSundayInMarch(SecondSundayInMarch)
cross
join
(
select dateadd(day,N,convert(datetime,str(year(getdate()),4)+'1101 02:00'))
from (select 0 union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6) F(N)
where datename(weekday
,dateadd(day,N,convert(datetime,str(year(getdate()),4)+'1101 02:00')))
='Sunday'
) FirstSundayInNovember(FirstSundayInNovember)
Not quite as clear, is it? But both queries produce the exact same query plan.

I hope you’ve enjoyed these CTE examples and appreciate their power. I couldn’t live without them.

10 comments:

  1. Very interesting, thanks! Not as funny as usual, though :)

    ReplyDelete
  2. @Naomi:

    Thanks!

    Funny's too hard... I only try to do that if I'm really ambitious or have the time. For this post, I just quickly scraped something together this morning.

    --Brad

    ReplyDelete
  3. Those are some great examples, Brad! Thank you so much for contributing to T-SQL Tuesday!

    ReplyDelete
  4. Way to cover the gambit again. I also like the tie-in to last months TSQL2sday.

    ReplyDelete
  5. Don't reference the same cte multiple times in one query. This kind of query tends to be very handy in some situations, for example in a union select.

    ;with cte_stuff as (select foo from bar)
    select *
    from table
    join cte_stuff as c1 on c1.column1 = table.somecolumn
    union all select *
    from table
    join cte_stuff as c2 on c2.column2 = table.someothercolumn

    Try to avoid this. The cte query gets parsed and executed twice. As mentioned, Mssql currently cannot spool cte results and use them multiple times. If you use big tables, performance may be an issue.

    ReplyDelete
  6. @Sven:

    Thanks for mentioning that... I know it all too well... I talked about that in my recursion post a year ago:

    http://bradsruminations.blogspot.com/2010/03/this-article-on-recurson-is-entitled.html

    --Brad

    ReplyDelete
  7. Your DST example seems rather complex and hard to follow, even the CTE version. As an example of what you can do with CTE's it may be fine but for actually finding the values it seems rather out of the way.

    I wrote my own method using CROSS APPLY that may be useful to some people:

    SELECT
    S.Now,
    DST.*,
    CONVERT(BIT, CASE
    WHEN S.[Now] >= DST.StartDate AND S.[Now] < DST.EndDate THEN 1
    ELSE 0
    END) AS IsDST
    FROM
    (SELECT GETDATE() AS [Now]) S
    CROSS APPLY
    (
    SELECT
    CONVERT(VARCHAR(4), DATEPART(Year, [Now])) AS DSTYear
    ) AS DSTSource
    CROSS APPLY
    (
    SELECT
    CONVERT(DATETIME, DSTYear + '-03-08 02:00:00.000') AS MarchStartDate,
    CONVERT(DATETIME, DSTYear + '-11-01 02:00:00.000') AS NovStartDate
    ) AS DSTPrep
    CROSS APPLY
    (
    SELECT
    MarchStartDate + CASE DATENAME(WEEKDAY, MarchStartDate)
    WHEN 'Monday' THEN 6
    WHEN 'Tuesday' THEN 5
    WHEN 'Wednesday' THEN 4
    WHEN 'Thursday' THEN 3
    WHEN 'Friday' THEN 2
    WHEN 'Saturday' THEN 1
    WHEN 'Sunday' THEN 0
    END AS StartDate,

    NovStartDate + CASE DATENAME(WEEKDAY, NovStartDate)
    WHEN 'Monday' THEN 6
    WHEN 'Tuesday' THEN 5
    WHEN 'Wednesday' THEN 4
    WHEN 'Thursday' THEN 3
    WHEN 'Friday' THEN 2
    WHEN 'Saturday' THEN 1
    WHEN 'Sunday' THEN 0
    END AS EndDate
    ) AS DST

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. This comment has been removed by a blog administrator.

    ReplyDelete