SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server(): Cool CROSS APPLY Tricks, Part 2

In Part 1, we looked at some fairly typical examples of using CROSS APPLY.

But the best is yet to come.

Consider this problem that Adam Haines brought up in his blog entry entitled Converting a Delimited String of Values into Columns. He presents the following table.

declare @t table
(
  ProductId int
 ,ProductName varchar(25)
 ,SupplierId int
 ,Descr varchar(50)
) 
insert @t select 1,'Product1',1,'A1,10in,30in,2lbs'
union all select 2,'Product2',2,'T6,15in,30in'
union all select 3,'Product3',1,'A2,1in,,0.5lbs'
union all select 4,'Product4',1,'X5,3in'

Note the column called Descr. That column contains 4 attributes of a product (Type, Length, Height, and Width) in a comma-delimited format. Your job: extract those values. Adam uses a very nice XML method to extract the values from the Descr field. Unfortunately the problem with using XML is that one of the attributes might contain a character that would break it, like an ampersand (&) or a less-than sign (<) for example.

So CROSS APPLY to the rescue!

The Descr field may not have all the commas in place (see row 4 in the table), so we will artifically add 4 of them to the end, creating a string to work with:

select ProductID
      ,SupplierID
      ,string
from @t
cross apply (select string=Descr+',,,,') f1
/*
  ProductID  SupplierID string
----------- ----------- ---------------------
          1           1 A1,10in,30in,2lbs,,,,
          2           2 T6,15in,30in,,,,
          3           1 A2,1in,,0.5lbs,,,,
          4           1 X5,3in,,,,
*/

You see how I used CROSS APPLY to introduce a new column called string? Now that we have that string to work with, we need to find where the commas are. The first comma is easy. We just use CHARINDEX. But the second and third and fourth commas all depend on the location of the previous comma. So we must use CROSS APPLY 4 times:

select ProductID
      ,SupplierID
      ,string
      ,p1,p2,p3,p4
from @t
cross apply (select string=Descr+',,,,') f1
cross apply (select p1=charindex(',',string)) f2
cross apply (select p2=charindex(',',string,p1+1)) f3
cross apply (select p3=charindex(',',string,p2+1)) f4
cross apply (select p4=charindex(',',string,p3+1)) f5
/*
  ProductID  SupplierID string                p1 p2 p3 p4
----------- ----------- --------------------- -- -- -- --
          1           1 A1,10in,30in,2lbs,,,,  3  8 13 18
          2           2 T6,15in,30in,,,,       3  8 13 14
          3           1 A2,1in,,0.5lbs,,,,     3  7  8 15
          4           1 X5,3in,,,,             3  7  8  9
*/

Now that we have all the locations of the commas, it’s easy to pull out the attributes that we want:

select ProductID
      ,SupplierID
      ,[Type]
      ,Length
      ,Height
      ,Weight
from @t
cross apply (select string=Descr+',,,,') f1
cross apply (select p1=charindex(',',string)) f2
cross apply (select p2=charindex(',',string,p1+1)) f3
cross apply (select p3=charindex(',',string,p2+1)) f4
cross apply (select p4=charindex(',',string,p3+1)) f5
cross apply (select [Type]=substring(string,1,p1-1)
                   ,Length=substring(string,p1+1,p2-p1-1)
                   ,Height=substring(string,p2+1,p3-p2-1)
                   ,Weight=substring(string,p3+1,p4-p3-1)) f6
/*
  ProductID  SupplierID Type Length Height Weight
----------- ----------- ---- ------ ------ ------
          1           1 A1   10in   30in   2lbs
          2           2 T6   15in   30in   
          3           1 A2   1in           0.5lbs
          4           1 X5   3in     
*/

The terrific thing about this is that it doesn’t involve anything in terms of cost. If you look at a query plan, all it consists of is a Table Scan followed by a Compute Scalar, the latter of which has an estimated operator cost of 0.0000001… in other words, pretty much no cost at all! The Query Optimizer combined all the CROSS APPLY stuff I introduced and essentially does the following query, (which you can see if you look at the properties of the Compute Scalar icon in the plan):

select ProductID
      ,SupplierID
      ,[Type]=substring([Descr]+',,,,',1,charindex(',',[Descr]+',,,,')-1)
      ,Length=substring([Descr]+',,,,',charindex(',',[Descr]+',,,,')+1,
               (charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,')+1)-
               charindex(',',[Descr]+',,,,'))-1)
      ,Height=substring([Descr]+',,,,',charindex(',',[Descr]+',,,,',
               charindex(',',[Descr]+',,,,')+1)+1,(charindex(',',[Descr]+',,,,',
               charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,')+1)+1)-
               charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,')+1))-1)
      ,Weight=substring([Descr]+',,,,',charindex(',',[Descr]+',,,,',
               charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,')+1)+1)+1,
               (charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,',
               charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,')+1)+1)+1)-
               charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,',
               charindex(',',[Descr]+',,,,')+1)+1))-1)
from @t

How would you like to parse through and troubleshoot that? What a nightmare! But you don’t have to… our CROSS APPLY query is very readable and easy to understand, and it's fast.

Here’s one more example…

Here’s a table of messages:

declare @t table (Message varchar(100))
insert @t select 'cross apply is really cool'
union all select 'and ntile is really cool too'
union all select 'and iced tea is really cool too'

Your job is to pull out all the word pairs and report on how often they occur throughout the table. Here’s how to do it with our new pal CROSS APPLY:

;with Numbers as
(
  select N=Number 
  from master..spt_values
  where Type='P' and Number>0
)
select WordPair,Occurrences=count(*)
from @t
join Numbers on substring(' '+Message,N,1)=' ' and N<len(Message)+1
cross apply (select string=substring(' '+Message+' ',N+1,len(Message)+1)) f1
cross apply (select p1=charindex(' ',string)) f2
cross apply (select p2=charindex(' ',string,p1+1)) f3
cross apply (select WordPair=case when p1<p2 then left(string,p2-1) end) f4
where WordPair is not null
group by WordPair
order by count(*) desc
/*
WordPair    Occurrences
----------- -----------
is really             3
really cool           3
cool too              2
cross apply           1
iced tea              1
and iced              1
and ntile             1
apply is              1
tea is                1
ntile is              1
*/

I use a table of numbers (I “cheated” and just used the numbers in master..spt_values) and JOIN that to the Message column based on the location of its space characters. The first CROSS APPLY (f1) creates a string to work with, which is a SUBSTRING of the Message, starting at a word boundary (and ending with a space I artificially put at the end). The second and third CROSS APPLYs (f2 and f3) find the location of the first and second space characters in that string. The final CROSS APPLY creates the word pair, if one was found. If no word pair is found, then WordPair is set to NULL (because there is no ELSE in the CASE). That is filtered out by the WHERE clause.

(Edit on July11,2009: For a more in-depth discussion of processing the above word-pair query, see my blog post entitled Word Pairs Revisited (Cool CROSS APPLY Tricks, Part 2 ½)).

Note that all these new columns that we introduced (string, p1, p2, WordPair) can be used in a WHERE clause and a GROUP BY clause or pretty much anywhere within the query. That’s the best part.

The query below is what is really calculated behind the scenes:

;with Numbers as
(
  select N=Number 
  from master..spt_values
  where Type='P' and Number>0
)
select WordPair=case when charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))<
                  charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),
                  charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1) 
                then left(substring(' '+Message+' ',N+1,len(Message)+1),
                  charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),
                  charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1)-1) 
                end
      ,count(*)
from @t
join Numbers on substring(' '+Message,N,1)=' ' and N<len(Message)+1
where case when charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))<
        charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),
        charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1) 
      then left(substring(' '+Message+' ',N+1,len(Message)+1),
        charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),
        charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1)-1) 
      end is not null
group by case when charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))<
           charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),
           charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1) 
         then left(substring(' '+Message+' ',N+1,len(Message)+1),
           charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),
           charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1)-1) 
         end
order by count(*) desc

Yeesh!

So after all this, I hope I have converted you to the beauty of the APPLY operator. Go spread the word!

11 comments:

Brian TkatchJuly 10, 2009 at 4:35 AM
Great post Brad.

Brad, perhaps you can slow it down and show each step individually. It's a lot to absorb all at once!

First explain the issue. Then show the "traditional" way to do it. Then show the creation of the solution step by step. The last query could have been part 3 all by itself.
Adam HainesJuly 10, 2009 at 6:11 AM
Good post Brad :). You are correct that there are certain characters that would cause the XML method to break. In this scenario, one would have to replace the invalid characters, with temporary holders. You can then replace the temporary holders back to their original form before returning results; however, This is not the poster child of simple and efficient coding. Before using the XML method one should really know and understand their data. The cross apply is a great alternative! Thanks for the blog post, it is defintely a good read.
Brad SchulzJuly 10, 2009 at 11:16 AM
Thanks Brian...

I debated whether I should go through that last example step by step... I was afraid of it getting too long. But I think you're right, I should have made it a Part 3... and I think I'll just go ahead and do that.

I appreciate the feedback.
Brad SchulzJuly 10, 2009 at 11:17 AM
Hi Adam...

I love all the XML methods that one can take advantage of. It was Plamen Ratchev who brought up the "invalid character" problem, and that kind of burst my bubble about using XML too much... or at least it now makes me think it through a little more carefully first.
Brian TkatchJuly 13, 2009 at 4:58 AM
@Brad

Don't worry about a post getting to long. It's better too long than too short!

After writing an entry, read it again (waiting a day or two helps a lot). Making spelling correction, punctuation, enhance the flavor, and, decide if it needs to be split in two. Most likely you'll get a rule of how many (non-example) words force a split, but don't ever short the info so to keep it as one.
NaomiMarch 18, 2010 at 9:48 PM
Thanks, Brad. Read this blog again and I think I'm convinced.
Mohammad SalimabadiAugust 8, 2010 at 2:01 PM
Hi Brad,
Really very interesting use of APPLY! Thanks for this.
But by this technique we can split just a limited item. So my recommend in this case is using of PARSNAME for splitting the string.

;with c as
(select productID, supplierID,
replace(
replace(replace(descr,'.','~!@|'),',','.')+
replicate('.', 3-(len(descr)-len(replace(descr,',',''))))+' '
,'..', '. .') as descr
from @t)
select productID, supplierID,
replace(parsename(descr,4),'~!@|','.') as Type,
replace(parsename(descr,3),'~!@|','.') as Length,
replace(parsename(descr,2),'~!@|','.') as Height,
replace(parsename(descr,1),'~!@|','.') as weight
from c
Mohammad SalimabadiAugust 10, 2010 at 4:35 AM
Hi Brad,
No need APPLY instead of that we can use multiple CTE like this:

;with f1 as
(select string=Descr+',,,,',productid, supplierid
from @t),
f2 as (select p1=charindex(',',string),* from f1),
f3 as (select p2=charindex(',',string,p1+1),* from f2),
f4 as (select p3=charindex(',',string,p2+1),* from f3),
f5 as (select p4=charindex(',',string,p3+1),* from f4)

select productid,
supplierid,
[Type]=substring(string,1,p1-1)
,Length=substring(string,p1+1,p2-p1-1)
,Height=substring(string,p2+1,p3-p2-1)
,Weight=substring(string,p3+1,p4-p3-1)
from f5
Ajmeri SeoMarch 27, 2019 at 6:58 PM
I don’t normally comment but I gotta tell regards for the post on this one : D. http://goodfridayquotess.net/good-friday-quotes/
Ajmeri SeoMarch 27, 2019 at 6:58 PM
baby strollers with high traction rollers should be much safer to use compared to those with plastic wheels,. https://theeastergoodfriday.com/easter-egg-hunt/
Ajmeri SeoMarch 27, 2019 at 6:58 PM
I am usually to blogging and that i really appreciate your articles. This great article has truly peaks my interest. I am about to bookmark your web blog and keep checking choosing details. http://mothersdaywishesi.com/happy-mothers-day-quotes-for-cards/

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Thursday, July 9, 2009

Cool CROSS APPLY Tricks, Part 2

11 comments:

Search This Blog

Subscribe

About Me

Keywords

Blog Archive

Followers

# # #

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Thursday, July 9, 2009

Cool CROSS APPLY Tricks, Part 2

11 comments:

Search This Blog

Subscribe

Subscribe To...

About Me

Keywords

Blog Archive

Followers

# # #