Thursday, July 9, 2009

Cool CROSS APPLY Tricks, Part 1

In most SQL Server books or tutorials, the APPLY operator would get no more than a 1 or 2 page treatment, if at all. (A notable exception is Itzik Ben-Gan’s book Inside Microsoft SQL Server 2005: T-SQL Querying, which goes into some depth). But in my opinion it’s a really cool and extremely useful operator.

The example below is representative of about 97% of the examples you see in books and tutorials regarding APPLY in that it uses a table-valued function. Using the AdventureWorks database, it finds all Vista credit cards that expire in June of 2008, and gives information about the contact associated with the credit card.

select f.FirstName
,f.LastName
,f.JobTitle
,f.ContactType
,cc.CardNumber
from Sales.CreditCard cc
join Sales.ContactCreditCard ccc on cc.CreditCardID=ccc.CreditCardID
cross apply dbo.ufnGetContactInformation(ccc.ContactID) f
where cc.ExpYear=2008
and cc.ExpMonth=6
and cc.CardType='Vista'
/*
FirstName LastName JobTitle ContactType CardNumber
--------- -------- ---------------- ------------- --------------
Peggy Justice Owner Store Contact 11119759315644
John McClane Purchasing Agent Store Contact 11119490672347
Laura Cai NULL Consumer 11112813884091
Natalie Gonzales NULL Consumer 11114369564985
Jarrod Sara NULL Consumer 11116045498593
Katherine Bailey NULL Consumer 11119100149656
Stephanie Gray NULL Consumer 11112324154556
Shawna Sharma NULL Consumer 11116413893110
Mindy Rai NULL Consumer 11115163407997
Jackson Jai NULL Consumer 11112011871602
And so on... (74 rows total)
*/
The SQL Engine first must evaluate everthing on the left-hand side of the APPLY, and then, for each row, it is able to process (or apply) the right-hand side to produce some kind of a (table-valued) result.

So in the example above, it first evaluates the left-hand side (i.e. JOINs the CreditCard and ContactCreditCard tables and filters through only the Vista credit cards that expire in June2008), and then for each row, it will apply the right-hand side (i.e. call the function,passing it the ContactID) to produce a result (the function returns a 1-row table consisting of FirstName, LastName, JobTitle, and ContactType).

That’s kind of cool, and it was something that you were not able to do prior to SQL2005, when the APPLY operator was introduced.

But the right-hand side doesn’t have to be a function.

Let’s take a look at all the Stores in AdventureWorks whose Main Office is in Wisconsin (we have to trudge through a lot of tables to get that information):

select c.CustomerID
,s.Name
from
Sales.Customer c
join Sales.Store s on c.CustomerID=s.CustomerID
join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID
join Person.Address a on ca.AddressID=a.AddressID
join Person.StateProvince sp on a.StateProvinceID=sp.StateProvinceID
where ca.AddressTypeID=3 --MainOffice
and sp.StateProvinceCode='WI'
/*
CustomerID Name
----------- ---------------------------
418 Good Bike Shop
453 Unique Bikes
543 Friendly Neighborhood Bikes
606 Little Bicycle Supply Shop
*/
And for each of those customers, let’s find out the top 3 products that they bought in terms of dollars (and while we’re at it, let’s show the dollars they spent on the product):

select c.CustomerID
,s.Name
,f1.ProductID
,f1.PurchaseAmt
from Sales.Customer c
join Sales.Store s on c.CustomerID=s.CustomerID
join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID
join Person.Address a on ca.AddressID=a.AddressID
join Person.StateProvince sp on a.StateProvinceID=sp.StateProvinceID
cross apply (select top (3)
ProductID
,PurchaseAmt=sum(LineTotal)
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod
on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=c.CustomerID
group by ProductID
order by sum(LineTotal) desc) f1
where ca.AddressTypeID=3 --MainOffice
and sp.StateProvinceCode='WI'
/*
CustomerID Name ProductID PurchaseAmt
----------- --------------------------- --------- -----------
418 Good Bike Shop 795 30367.3500
418 Good Bike Shop 794 24136.8075
418 Good Bike Shop 792 23508.5175
453 Unique Bikes 773 16319.9520
453 Unique Bikes 771 12239.9640
453 Unique Bikes 772 12239.9640
543 Friendly Neighborhood Bikes 782 9638.9580
543 Friendly Neighborhood Bikes 868 671.9040
543 Friendly Neighborhood Bikes 869 335.9520
606 Little Bicycle Supply Shop 717 1717.8000
606 Little Bicycle Supply Shop 838 858.9000
606 Little Bicycle Supply Shop 738 809.3280
*/
Notice that the right-hand side of the CROSS APPLY is a correlated query where we refer to c.CustomerID, a column from the left-hand side. So each of the 4 customers from the left-hand side were applied to our correlated query on the right-hand side to produce 3 rows each.

But let’s not stop there… We can keep chaining additional information. For each of those products we found, let’s find out the top customer for that product (i.e. who placed the most orders on the product?):

select c.CustomerID
,s.Name
,f1.ProductID
,f1.PurchaseAmt
,f2.TopCustomerID
,f2.NumOrds
from Sales.Customer c
join Sales.Store s on c.CustomerID=s.CustomerID
join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID
join Person.Address a on ca.AddressID=a.AddressID
join Person.StateProvince sp on a.StateProvinceID=sp.StateProvinceID
cross apply (select top (3)
ProductID
,PurchaseAmt=sum(LineTotal)
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod
on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=c.CustomerID
group by ProductID
order by sum(LineTotal) desc) f1
cross apply (select top (1)
TopCustomerID=CustomerID
,NumOrds=count(*)
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod
on soh.SalesOrderID=sod.SalesOrderID
where ProductID=f1.ProductID
group by CustomerID
order by count(*) desc) f2
where ca.AddressTypeID=3 --MainOffice
and sp.StateProvinceCode='WI'
/*
CustomerID Name ProductID PurchaseAmt TopCustomerID NumOrds
----------- --------------------------- --------- ----------- ------------- -------
418 Good Bike Shop 795 30367.3500 4 8
418 Good Bike Shop 794 24136.8075 4 8
418 Good Bike Shop 792 23508.5175 16 8
453 Unique Bikes 773 16319.9520 75 4
453 Unique Bikes 771 12239.9640 20 4
453 Unique Bikes 772 12239.9640 118 4
543 Friendly Neighborhood Bikes 782 9638.9580 10 8
543 Friendly Neighborhood Bikes 868 671.9040 23 4
543 Friendly Neighborhood Bikes 869 335.9520 10 4
606 Little Bicycle Supply Shop 717 1717.8000 166 8
606 Little Bicycle Supply Shop 838 858.9000 24 6
606 Little Bicycle Supply Shop 738 809.3280 166 12
*/
This second CROSS APPLY (f2) took the f1.ProductID that resulted from the first CROSS APPLY (f1), and it applied it to the correlated query on its right-hand side to produce a single row of information, namely the TopCustomerID and NumOrds columns.

Note that if we wanted to add product descriptions to this query, which we would get from the Production.Product table, we must perform the JOIN to that table AFTER the CROSS APPLY, because everything with APPLY is processed left-to-right.

select c.CustomerID
,s.Name
,f1.ProductID
,ProductName=p.Name
,f1.PurchaseAmt
,f2.TopCustomerID
,f2.NumOrds
from Sales.Customer c
join Sales.Store s on c.CustomerID=s.CustomerID
join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID
join Person.Address a on ca.AddressID=a.AddressID
join Person.StateProvince sp on a.StateProvinceID=sp.StateProvinceID
cross apply (select top (3)
ProductID
,PurchaseAmt=sum(LineTotal)
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod
on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=c.CustomerID
group by ProductID
order by sum(LineTotal) desc) f1
cross apply (select top (1)
TopCustomerID=CustomerID
,NumOrds=count(*)
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod
on soh.SalesOrderID=sod.SalesOrderID
where ProductID=f1.ProductID
group by CustomerID
order by count(*) desc) f2
join Production.Product p on f1.ProductID=p.ProductID
where ca.AddressTypeID=3 --MainOffice
and sp.StateProvinceCode='WI'
/*
CustomerID Name ProductID ProductName PurchaseAmt TopCustomerID NumOrds
----------- --------------------------- --------- -------------------------- ----------- ------------- -------
606 Little Bicycle Supply Shop 717 HL Road Frame - Red, 62 1717.8000 166 8
606 Little Bicycle Supply Shop 738 LL Road Frame - Black, 52 809.3280 166 12
453 Unique Bikes 771 Mountain-100 Silver, 38 12239.9640 20 4
453 Unique Bikes 772 Mountain-100 Silver, 42 12239.9640 118 4
453 Unique Bikes 773 Mountain-100 Silver, 44 16319.9520 75 4
543 Friendly Neighborhood Bikes 782 Mountain-200 Black, 38 9638.9580 10 8
418 Good Bike Shop 792 Road-250 Red, 58 23508.5175 16 8
418 Good Bike Shop 794 Road-250 Black, 48 24136.8075 4 8
418 Good Bike Shop 795 Road-250 Black, 52 30367.3500 4 8
606 Little Bicycle Supply Shop 838 HL Road Frame - Black, 44 858.9000 24 6
543 Friendly Neighborhood Bikes 868 Women's Mountain Shorts, M 671.9040 23 4
543 Friendly Neighborhood Bikes 869 Women's Mountain Shorts, L 335.9520 10 4
*/
If we had tried to place that JOIN with all the other JOINs (i.e. BEFORE the first CROSS APPLY), we would get an error: The multi-part identifier “f1.ProductID” could not be bound.

This is all interesting, but the real magic of CROSS APPLY is yet to come in Part 2.

No comments:

Post a Comment