SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server(): Integer List Splitting: A SQL Fable

nce upon a time, not too long ago, there was a company called AdventureWorks. And working in this company was a diligent database developer named Dan Druff. Dan was responsible for developing and maintaining the AdventureWorks SQL Server database.

One day, Dan got a request to put together a query. The specs called for accepting a string of comma-delimited ProductID’s and to produce a result set representing orders that used those ProductID’s. The result set was to consist of the ProductID, the Product Name, the SalesOrderID, the Order Date, and the name of the Territory associated with the order. The final output was to be ordered by ProductID and SalesOrderID.

This sounded easy enough. Dan already had a user-defined function to split a comma-delimited list of integers. He had written it himself:

create function dbo.ufn_SplitIntArray
(
   @List       varchar(max)
  ,@Delimiter  char(1)
)
returns @Items table (Item int)
as
begin
  declare @Item varchar(12)
         ,@Pos int
  while len(@List)>0
  begin
    set @Pos=charindex(@Delimiter,@List)
    if @Pos=0 set @Pos=len(@List)+1
    set @Item=left(@List,@Pos-1)
    insert @Items select convert(int,ltrim(rtrim(@Item)))
    set @List=substring(@List,@Pos+len(@Delimiter),len(@List))
  end
  return
end

He put together the requested query, incorporating his function, and he tested it out:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID 
      ,ProductName=p.Name
      ,h.SalesOrderID 
      ,h.OrderDate
      ,TerritoryName=t.Name 
from dbo.ufn_SplitIntArray(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID 
join Production.Product p on d.ProductID=p.ProductID 
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID 
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
order by d.ProductID,h.SalesOrderID 
/*
ProductID ProductName                      SalesOrderID OrderDate  TerritoryName
--------- -------------------------------- ------------ ---------- --------------
      897 LL Touring Frame - Blue, 58             51823 2003-08-01 United Kingdom
      897 LL Touring Frame - Blue, 58             51875 2003-08-01 Australia
      911 LL Road Seat/Saddle                     51140 2003-07-01 Central
      911 LL Road Seat/Saddle                     53472 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53488 2003-09-01 Canada
      911 LL Road Seat/Saddle                     53495 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53501 2003-09-01 Southeast
      911 LL Road Seat/Saddle                     53529 2003-09-01 Canada
      942 ML Mountain Frame-W - Silver, 38        51120 2003-07-01 United Kingdom
      942 ML Mountain Frame-W - Silver, 38        51711 2003-08-01 Northwest
      942 ML Mountain Frame-W - Silver, 38        51758 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51799 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51856 2003-08-01 Southwest
*/

It seemed to work just fine… it spit out the result quickly and efficiently.

But something nagged at Dan. His function had serviced him well in the past, but the function was so “yesterday”… so “1970’s”… it just felt kind of old-fashioned and tired. Dan wanted a function that was more “hip” and more “with it”.

And then Dan had an idea.

He recalled reading an excellent blog post by MVP Aaron Bertrand from last July that did a roundup of various string-splitting techniques. One of the techniques was just like Dan’s, and Aaron had given the approach a nickname of RBAR, meaning “Row By Agonizing Row”. The roundup showed that this technique was one of the poorest performers.

So Dan made a decision right then and there that he was going to move into the 21st century… he was going to upgrade his function to one of the better performers that Aaron had outlined in his blog.

The fastest performer in Aaron’s roundup by far was the CLR approach. But that kind of scared Dan… he didn’t want to get into something that he didn’t fully understand.

So he gave one of the other techniques a try.

In general, the Numbers Table technique seemed to be the next-best performer after CLR in Aaron’s roundup, so Dan went about creating a table of numbers from 1 to 1,000,000 in his AdventureWorks database, and created a primary key for it:

;with 
  L0(c) as (select 0 from (values (0),(0),(0)) x(c)) --3 Rows
 ,L1(c) as (select 0 from L0 a, L0 b, L0 c)          --27 Rows (3x3x3)
 ,L2(c) as (select 0 from L1 a, L1 b, L1 c)          --19683 Rows (27x27x27)
 ,L3(c) as (select 0 from L2 a, L2 b)                --387,420,489 Rows (19683x19683)
 ,NN(n) as (select row_number() over (order by (select 0)) from L3)
select Number=isnull(convert(int,n),0)  --Force it to be a "not null" column
into dbo.Numbers 
from NN 
where n<=1000000
go 
alter table dbo.Numbers 
add constraint PK_Numbers 
primary key clustered (Number)
with (fillfactor=100)

And then he put together a function to split a string of integers using this auxiliary table of numbers:

create function dbo.ufn_SplitIntArrayNum
(
   @List       varchar(max)
  ,@Delimiter  char(1)
)
returns table
as
return
select Item=convert(int,String)
from dbo.Numbers with (nolock)
cross apply (select ItemPos=Number) F1
cross apply (select DelimPos=charindex(@Delimiter,@List+@Delimiter,ItemPos)) F2
cross apply (select String=rtrim(ltrim(substring(@List,ItemPos,DelimPos-ItemPos)))) F3
where ItemPos<=convert(int,len(@List))
  and substring(@Delimiter+@List,ItemPos,1)=@Delimiter

He tested it out…

select * 
from dbo.ufn_SplitIntArrayNum('123,456,789',',')
/*
Item
----
 123
 456
 789
*/

…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID 
      ,ProductName=p.Name
      ,h.SalesOrderID 
      ,h.OrderDate
      ,TerritoryName=t.Name 
from dbo.ufn_SplitIntArrayNum(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID 
join Production.Product p on d.ProductID=p.ProductID 
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID 
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
order by d.ProductID,h.SalesOrderID 
/*
ProductID ProductName                      SalesOrderID OrderDate  TerritoryName
--------- -------------------------------- ------------ ---------- --------------
      897 LL Touring Frame - Blue, 58             51823 2003-08-01 United Kingdom
      897 LL Touring Frame - Blue, 58             51875 2003-08-01 Australia
      911 LL Road Seat/Saddle                     51140 2003-07-01 Central
      911 LL Road Seat/Saddle                     53472 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53488 2003-09-01 Canada
      911 LL Road Seat/Saddle                     53495 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53501 2003-09-01 Southeast
      911 LL Road Seat/Saddle                     53529 2003-09-01 Canada
      942 ML Mountain Frame-W - Silver, 38        51120 2003-07-01 United Kingdom
      942 ML Mountain Frame-W - Silver, 38        51711 2003-08-01 Northwest
      942 ML Mountain Frame-W - Silver, 38        51758 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51799 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51856 2003-08-01 Southwest
*/

Well, it gave the correct result, but something seemed strange. The query seemed sluggish. It didn’t seem as quick as his query that used his tired old “RBAR” function. But that couldn’t be possible… after all, Aaron’s blog had clearly demonstrated the superiority of the Numbers approach.

So Dan fired up Profiler and did a comparison between the two queries. And he looked at the Actual Execution Plans for the two queries and he was shocked at what he discovered:

/*
                                                  Estimated     Estimated
Description           CPU    Reads  Duration      QueryCost  Number of Rows  MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR)      62ms      832     304ms       0.790177         359.935      2,192KB
Numbers Technique  5961ms  244,199    2229ms    3556.620000  56,629,000.000     37,984KB
*/

What!!?? This couldn’t be! Dan’s RBAR approach completely annihilated the Numbers approach by a mile! Heck, it beat it by more than a mile… it beat it by a light-year!

The CPU and Reads of the Numbers Technique was huge! The memory grant was almost 38MB! And the query took over 2 full seconds to run! This was ridiculous!

Dan was very puzzled and a little scared. Maybe he was doing something wrong. Maybe something was going on that he didn’t understand.

He decided to try another technique.

The approach that Aaron’s blog named “Inline 2” looked promising. So Dan created that function in his database:

create function dbo.ufn_SplitIntArrayInline
(
   @List       varchar(max)
  ,@Delimiter  char(1)
)
returns table
as
return
select Item=convert(int,(substring(@Delimiter+@List+@Delimiter
                                  ,w.n+1
                                  ,charindex(@Delimiter
                                            ,@Delimiter+@List+@Delimiter
                                            ,w.n+1) - w.n - 1
                                  )))
from (select n=v0.n+v1.n+v2.n+v3.n
      from (select n = 0 
            union all select 1  union all select 2  union all select 3
            union all select 4  union all select 5  union all select 6
            union all select 7  union all select 8  union all select 9
            union all select 10 union all select 11 union all select 12
            union all select 13 union all select 14 union all select 15) v0
          ,(select n = 0 
            union all select 16  union all select 32  union all select 48
            union all select 64  union all select 80  union all select 96
            union all select 112 union all select 128 union all select 144
            union all select 160 union all select 176 union all select 192
            union all select 208 union all select 224 union all select 240) v1
          ,(select n = 0
            union all select 256  union all select 512  union all select 768 
            union all select 1024 union all select 1280 union all select 1536 
            union all select 1792 union all select 2048 union all select 2304
            union all select 2560 union all select 2816 union all select 3072
            union all select 3328 union all select 3584 union all select 3840) v2
          ,(select n = 0
            union all select 4096   union all select 8192   union all select 12288
            union all select 16384  union all select 20480  union all select 24576
            union all select 28672  union all select 32768  union all select 36864
            union all select 40960  union all select 45056  union all select 49152
            union all select 53248  union all select 57344  union all select 61440
            union all select 65536  union all select 69632  union all select 73728
            union all select 77824  union all select 81920  union all select 86016
            union all select 90112  union all select 94208  union all select 98304
            union all select 102400 union all select 106496 union all select 110592
            union all select 114688 union all select 118784 union all select 122880 
            union all select 126976 union all select 131072 union all select 135168
            union all select 139264 union all select 143360 union all select 147456) v3
     ) w
where w.n=charindex(@Delimiter,@Delimiter+@List+@Delimiter,w.n)
  and w.n<len(@Delimiter+@List)

He tested it out…

select * 
from dbo.ufn_SplitIntArrayInline('123,456,789',',')
/*
Item
----
 123
 456
 789
*/

…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID 
      ,ProductName=p.Name
      ,h.SalesOrderID 
      ,h.OrderDate
      ,TerritoryName=t.Name 
from dbo.ufn_SplitIntArrayInline(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID 
join Production.Product p on d.ProductID=p.ProductID 
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID 
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
order by d.ProductID,h.SalesOrderID 
/*
Msg 537, Level 16, State 5, Line 2
Invalid length parameter passed to the LEFT or SUBSTRING function.
*/

What!!?? What was going on? The query completely bombed! This didn’t make sense. The function worked when run by itself, but once it was JOINed with other tables, it turned over and died.

Dan looked at the estimated execution plan and recorded some information to compare it with his previous two approaches:

/*
                                                  Estimated     Estimated
Description           CPU    Reads  Duration      QueryCost  Number of Rows  MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR)      62ms      832     304ms       0.790177         359.935      2,192KB
Numbers Technique  5961ms  244,199    2229ms    3556.620000  56,629,000.000     37,984KB
Inline Technique      N/A      N/A       N/A  116532.000000  34,912,200.000        N/A
*/

Omigosh! Look at the Estimated Query Cost! The optimizer estimated this query was going to take 116532/3600 = 32 Hours to run! No wonder it choked!

Dan was really confused at this point. But he decided to bravely trudge onward.

He figured he would give the XML approach a try… it certainly seemed to hold its own in Aaron’s tests. This XML technique was an approach that some joker named Brad Schulz had talked about several times in his blog. Dan didn’t always read Brad’s blog posts… Sometimes Brad had a weird sense of humor and that didn’t always sit well with Dan… Dan was kind of a serious fellow.

But, despite his feelings about Brad, Dan went ahead and created a function using the XML string-splitting function in his database:

create function dbo.ufn_SplitIntArrayXML
(
   @List       varchar(max)
  ,@Delimiter  char(1)
)
returns table
as
return
select Item
from (select XMLString='<x>'+replace(@List,@Delimiter,'</x><x>')+'</x>') F1
cross apply (select XMLList=cast(XMLString as xml).query('.')) F2
cross apply XMLList.nodes('/x') F3(XMLNode)
cross apply (select Item=XMLNode.value('(./text())[1]','int')) F4

He tested it out…

select * 
from dbo.ufn_SplitIntArrayXML('123,456,789',',')
/*
Item
----
 123
 456
 789
*/

…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID 
      ,ProductName=p.Name
      ,h.SalesOrderID 
      ,h.OrderDate
      ,TerritoryName=t.Name 
from dbo.ufn_SplitIntArrayXML(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID 
join Production.Product p on d.ProductID=p.ProductID 
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID 
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
order by d.ProductID,h.SalesOrderID 
/*
ProductID ProductName                      SalesOrderID OrderDate  TerritoryName
--------- -------------------------------- ------------ ---------- --------------
      897 LL Touring Frame - Blue, 58             51823 2003-08-01 United Kingdom
      897 LL Touring Frame - Blue, 58             51875 2003-08-01 Australia
      911 LL Road Seat/Saddle                     51140 2003-07-01 Central
      911 LL Road Seat/Saddle                     53472 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53488 2003-09-01 Canada
      911 LL Road Seat/Saddle                     53495 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53501 2003-09-01 Southeast
      911 LL Road Seat/Saddle                     53529 2003-09-01 Canada
      942 ML Mountain Frame-W - Silver, 38        51120 2003-07-01 United Kingdom
      942 ML Mountain Frame-W - Silver, 38        51711 2003-08-01 Northwest
      942 ML Mountain Frame-W - Silver, 38        51758 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51799 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51856 2003-08-01 Southwest
*/

Hmmm… well at least this approach didn’t croak like the Inline technique did. The result was correct. And it seemed pretty snappy in execution. Dan investigated the Profiler and Actual Execution Plan data and added data to his table of figures:

/*
                                                  Estimated     Estimated
Description           CPU    Reads  Duration      QueryCost  Number of Rows  MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR)      62ms      832     304ms       0.790177         359.935      2,192KB
Numbers Technique  5961ms  244,199    2229ms    3556.620000  56,629,000.000     37,984KB
Inline Technique      N/A      N/A       N/A  116532.000000  34,912,200.000        N/A
XML Technique       237ms      843     449ms      26.408000      35,993.500     25,856KB
*/

This was really weird! Dan’s original RBAR approach was still the clear winner in all these tests. The XML approach came pretty darn close… MUCH closer than the other methods… but it wanted a memory grant of 26MB!

Dan’s world had turned upside-down. He was expecting to hear news bulletins any minute talking about pigs flying and snowballs discovered in Hell. This just wasn’t right!

Dan figured he’d have to go the final step… There was only one clear path. He took a deep breath, let it out slowly, and decided he would have to brave the CLR technique.

Aaron Bertrand’s roundup had given a link to Adam Machanic’s blog that had the C# source code for doing the string splitting. But, as little as Dan knew about C#, he could tell that Adam’s code returned a table of NVARCHAR vales, not INT values.

So Dan gave a call to an old friend from college named Biff Wellington who was a C# consultant and asked if he could help. Biff was thrilled to hear from Dan and said he’d be happy to assist… anything for an old pal. Dan referred him to the source code at Adam Machanic’s blog and, an hour later, Biff emailed Dan the revised code… and a bill for $150. Dan swore under his breath.

Here was the code that Biff had sent, which amended Adam Machanic’s code to include a FillRowInts function and SplitStringOfInts function to return a table of integers:

Click here to download the C# Code

Now Dan had to jump through all the hoops to get this C# code to work. He downloaded a copy of C# 2008 Express Edition from Microsoft (at least that was free, unlike Biff’s services… the jerk). He created a project using the Class Library template, calling the project BiffSucks. He pasted in Biff’s code and hit the F6 key to Build the Solution and it built successfully! Now he was ready to get into SSMS to create his function.

First he made sure that the server was enabled for CLR:

sp_configure 'CLR Enabled',1
go
reconfigure
go

Then he created an assembly, which referenced the DLL he had built:

create assembly StringHelper
from 'c:\Users\DanDruff\Documents\Visual Studio 2008\Projects\'
    +'BiffSucks\BiffSucks\obj\Release\BiffSucks.dll'

And then he created the function to use that assembly and its SplitStringOfInts function:

create function dbo.ufn_SplitIntArrayCLR
(
   @List       nvarchar(max)
  ,@Delimiter  nchar(1)
)
returns table (Item int)
external name StringHelper.UserDefinedFunctions.SplitStringOfInts

Gee, this wasn’t so hard after all!

He tested it out…

select * 
from dbo.ufn_SplitIntArrayCLR(N'123,456,789',N',')
/*
Item
----
 123
 456
 789
*/

…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList nvarchar(max) = N'897,911,942'
select d.ProductID 
      ,ProductName=p.Name
      ,h.SalesOrderID 
      ,h.OrderDate
      ,TerritoryName=t.Name 
from dbo.ufn_SplitIntArrayCLR(@ProductList,N',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID 
join Production.Product p on d.ProductID=p.ProductID 
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID 
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
order by d.ProductID,h.SalesOrderID 
/*
ProductID ProductName                      SalesOrderID OrderDate  TerritoryName
--------- -------------------------------- ------------ ---------- --------------
      897 LL Touring Frame - Blue, 58             51823 2003-08-01 United Kingdom
      897 LL Touring Frame - Blue, 58             51875 2003-08-01 Australia
      911 LL Road Seat/Saddle                     51140 2003-07-01 Central
      911 LL Road Seat/Saddle                     53472 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53488 2003-09-01 Canada
      911 LL Road Seat/Saddle                     53495 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53501 2003-09-01 Southeast
      911 LL Road Seat/Saddle                     53529 2003-09-01 Canada
      942 ML Mountain Frame-W - Silver, 38        51120 2003-07-01 United Kingdom
      942 ML Mountain Frame-W - Silver, 38        51711 2003-08-01 Northwest
      942 ML Mountain Frame-W - Silver, 38        51758 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51799 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51856 2003-08-01 Southwest
*/

It worked! And it seemed pretty quick! Dan was hopeful as he started to record the Profiler and Actual Execution Plan figures into his table:

/*
                                                  Estimated     Estimated
Description           CPU    Reads  Duration      QueryCost  Number of Rows  MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR)      62ms      832     304ms       0.790177         359.935      2,192KB
Numbers Technique  5961ms  244,199    2229ms    3556.620000  56,629,000.000     37,984KB
Inline Technique      N/A      N/A       N/A  116532.000000  34,912,200.000        N/A
XML Technique       237ms      843     449ms      26.408000      35,993.500     25,856KB
CLR Technique       443ms     1541     526ms      22.483400     359,935.000     99,744KB
*/

Huh? The CLR approach performed in under a second, but it was still slower than the RBAR and XML approach! And it had almost twice the reads! And it wanted almost 100MB of memory to do its job!!

Dan was feeling faint. This must be a dream, he thought. He pinched himself. But he didn’t wake up… it was not a dream.

Dan was at wit’s end… The world no longer made any sense.

Not knowing what else to do, he decided to go to Brad Schulz’s blog to read more about the XML technique. Even though it underperformed compared to Dan’s boring old RBAR approach, at least the whole idea of XML was kind of cool and modern.

And when he navigated to the blog, he came upon a post that Brad had written on Aug13,2010 about Query Hints. Brad was describing a situation frighteningly similar to what Dan was experiencing. By adding query hints of FORCE ORDER, MAXDOP 1, and LOOP JOIN to the query, Brad demonstrated how the execution plan could be changed to give better performance.

So Dan decided to add these query hints to each of the 5 techniques and see what would happen.

The new query plans created by the hints certainly made a lot more sense… no more parallelism and weird JOIN configurations and such… and the Inline technique actually worked this time!

Here is what he recorded:

/*
                                                  Estimated     Estimated
Description           CPU    Reads  Duration      QueryCost  Number of Rows  MemoryGrant
----------------------------------------------------------------------------------------
NO HINTS:
Original (RBAR)      62ms      832     304ms       0.790177         359.935      2,192KB
Numbers Technique  5961ms  244,199    2229ms    3556.620000  56,629,000.000     37,984KB
Inline Technique      N/A      N/A       N/A  116532.000000  34,912,200.000        N/A
XML Technique       237ms      843     449ms      26.408000      35,993.500     25,856KB
CLR Technique       443ms     1541     526ms      22.483400     359,935.000     99,744KB
----------------------------------------------------------------------------------------
WITH HINTS:
Original (RBAR)      18ms      257      67ms        1.13386         288.875      1,024KB
Numbers Technique    23ms      212     178ms    42676.40000  20,968,100.000    383,976KB
Inline Technique    130ms      202     157ms    30394.90000  10,049,000.000    383,976KB
XML Technique        34ms      203      41ms       45.78860      28,887.500      6,768KB
CLR Technique        32ms      203      32ms      220.80600     288,875.000     63,000KB
----------------------------------------------------------------------------------------
*/

With the hints incorporated into the queries, all of the approaches were nice and quick, with a Duration of under 200ms. The Numbers and Inline techniques were clearly the worst performers… and they both wanted an ENORMOUS memory grant of about 384MB!!!… This was because the optimizer made such a gross over-estimate of how many records would result, that it figured it was going to have to sort tens of millions of rows. Dan eliminated those techniques outright.

Oddly enough, his RBAR technique still had the lowest CPU, but among the three remaining techniques, it had the highest Reads and longest Duration. On the other hand, RBAR had the lowest memory grant by far. The CLR, which was the fastest performer, wanted 60 times the memory that Dan’s RBAR technique wanted. At least the XML approach was not as much of a memory hog, since it had a more reasonable estimate of the number of resulting rows than CLR.

Ultimately, Dan made a decision. He was happy with his RBAR function. Sure, it wasn’t “sexy” like the other approaches, but it had a lot going for it. It was readable. It was simple. The optimizer made reasonable estimates of rows when it was incorporated into a query. It did not demand a lot of memory. And, if in the future, Dan forgot to include the FORCE ORDER and MAXDOP 1 and LOOP JOIN hints in a query that used his function, it would still perform just fine with very little memory requirements (unlike the other approaches).

However, Dan also vowed to make sure that in the future, he would at least test out future queries to make sure that his RBAR function was still a reasonable performer. There may be other, more complicated queries in his future that may work better with the other techniques. He now knew, better than ever, the golden rule of SQL Server: “It depends.”

Dan sat back in his chair, pleased with his decisions.

And he lived happily ever after.

The morals of the story are:

What works well in theory may not work well in practice… It depends… Test test test.

and (forgive me)…

If you're itching to get out of a hairy splitting situation, apply the Dan Druff treatment:

Gather data in testing various approaches

Hints if necessary

Repeat for future queries

20 comments:

Piotr RodakAugust 17, 2010 at 11:20 AM
Nice one, Brad
Did you try to run some tests for 30 and 300 integers in the list? Results for 3 parameters are indeed revolting :)

Piotr
Piotr RodakAugust 17, 2010 at 11:40 AM
I took the hint and read now the other post. It's amazing in how many ways you can shoot your foot, isn't it? :)
I think that it should be said that hints should be used prescribed by a doctor, because of side effects they can cause.

Thanks

Piotr
Brad SchulzAugust 17, 2010 at 11:41 AM
@Piotr:

I most likely will do a follow-up to this post, because I also wanted to demonstrate that the performance figures are similar when attacking the query with a ProductID IN (SELECT Item FROM MyFunction) approach.

I should also do a test for more integers as you suggest. I would think as the number of items increases, the CLR method would come through as a better performer compared to the others.

But in reality, how many times is a list of thousands of integers passed to a routine like this as opposed to just a small handful?

The fact that only 3 lousy integers causes this much grief is indeed interesting.

--Brad
Michael J. SwartAugust 17, 2010 at 11:57 AM
Ha! This actually made me laugh out loud getting looks from colleagues forcing me to send the link to this site to them.
Adam MachanicAugust 17, 2010 at 2:54 PM
WTF? The guy could have left me a comment on my blog and I would have shown him how to convert it FOR FREE. $150? WTF!? It's a 3-minute conversion, if that.
Adam MachanicAugust 17, 2010 at 3:00 PM
Sorry, did I take this "fable" too seriously? I skipped the title the first time and just did a search for my name (found the post via a Google alert) :-)
Brad SchulzAugust 17, 2010 at 3:05 PM
@Adam: LOL!
Adam MachanicAugust 17, 2010 at 3:08 PM
Okay, after reading the ENTIRE post I'm finding the whole thing to be a bit misleading. The real moral here is that "Dan" was focused on the wrong thing the whole time. The performance of splitting the strings has nothing to do with the rest of the performance picture in this case--just as you've shown. So this is not a post about integer splitting at all, it's a post about the impact of bad estimates.
Brad SchulzAugust 17, 2010 at 3:17 PM
@Adam: In a way, you have a point about the fable being misleading.

The performance of the various string-splitting techniques, I'm sure, is just what Aaron demonstrated in his blog... when they are run by themselves in a vacuum.

But the only reason we put these integer-splitting techniques together is so they can be incorporated into queries, right?

It's frustrating that a simple query on a table with only 100,000 rows (Sales.SalesOrderDetail) that JOINs with a couple other tables causes this much of a fiasco with a list of only 3 integers.

I just want to demonstrate that one should not blindly accept a particular technique as being the "best" or "fastest".

--Brad
Adam MachanicAugust 17, 2010 at 3:27 PM
Well, we need to tune for the actual bottleneck. If you're splitting large strings, then yes, you do want to use the "fastest" method for doing that. Otherwise, you want to use the "fastest" method for whatever else you're doing.

If you want the best of both worlds for this particular case, you can get it. Notice that the "RBAR" method uses a multistatement TVF? It might be interesting for "Dan" to wrap the call to the CLR version (or any of the others, for that matter) in a multistatement TVF and try these tests again...
Brad SchulzAugust 17, 2010 at 3:31 PM
Interesting idea regarding the multi-statement suggestion... I'll definitely check that out... sounds promising.

Thanks for the input!
Paul WhiteAugust 26, 2010 at 6:52 AM
Priceless! Thanks Brad :)
Jeff ModenSeptember 13, 2010 at 4:15 AM
A part of the problem is what CHARINDEX does when it sees a MAX datatype and what the execution plan does (full cartesian join, usually) when there's a datatype mismatch between the result of the CHARINDEX and the integers in the numbers table.

I just happen to be doing some work in this area and I'll try to remember to post back with some test info and troubleshooting results.

And, you're correct... don't believe everything you see on the internet even when there's test code. ;-)
Jeff ModenSeptember 13, 2010 at 5:32 AM
I'm not sure what I'm doing "wrong". I'm getting "only" a 70ms difference in performance between the Loop and Numbers table functions.

Of course, for 10,000 "hits" a second, that 70ms will make all the difference in the world, but I'm not seeing the huge difference that you're seeing, Brad. In fact, I suspect there's something wrong with your machine because my 8 year old desktop (1.8GHZ single P4) runs the While Loop version of the code in only 31ms/31ms (CPU AND DURATION) instead of the 62ms/304ms that you reported for the same code.

It will be interesting when I compare the two methods across a whole gambit of data. Thanks for taking the time to post this.
Jeff ModenSeptember 13, 2010 at 10:36 PM
Just an update... haven't tested everything yet but it looks like the addition of the Cross Apply's that you made for reader clarity also cause the code to run in about a third of the time on my humble desktop. I see that Paul White ("Mr. Cross Apply") visited this thread and, hopefully, he might be able to explain why. It IS interesting that the Cross Apply's caused that percentage of a speed gain for lower width splits. Like I said, 70ms can make a huge difference if the code is running 10,000 times a second.

As you and I may have guessed (but still had to check for), the datatype mismatch doesn't affect the While Loop noticeably and I can't get the numbers table solution to misbehave as badly as you seem to have.

Also, thanks for the XML splitter. I have an active post that you responded to looking for some of these to test with.
Jeff ModenSeptember 15, 2010 at 9:58 PM
Brad, any feedback on why there's so much difference between your machine and mine? Do you, perhaps, have a different collation or something setup?
Mike CSeptember 15, 2011 at 5:32 AM
Nice article. What if you use varchar(8000) or nvarchar(4000) instead of (max)? Also what if you serialize your results to a temp table instead of joining directly to functions? How does this affect performance?
ShaliniDecember 1, 2016 at 10:11 PM
Actually, at the time I wrote this cache indexes were not supported yet, so now you could also use a @CacheIndex instead of the query cache for the customer query.
There have been a lot of performance enhancements added to EclipseLink since this post, perhaps its time for a "How to improve JPA performance by 2,825%" post.

web design company in chennai
UnknownJuly 18, 2017 at 12:29 AM
This comment has been removed by a blog administrator.
rohitDecember 29, 2021 at 2:00 AM
Thanks for sharing such an informational blog which will surely be a big help to the small medium enterprise so that they can choose the best suited tool for their business.

CRM Software In India

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Tuesday, August 17, 2010

Integer List Splitting: A SQL Fable

20 comments:

Search This Blog

Subscribe

About Me

Keywords

Blog Archive

Followers

# # #

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Tuesday, August 17, 2010

Integer List Splitting: A SQL Fable

20 comments:

Search This Blog

Subscribe

Subscribe To...

About Me

Keywords

Blog Archive

Followers

# # #