(If you’d like to look at these past articles, read Adam’s posts here, here, here, and here and read my posts and comments here, here, here, and here.)
In Adam’s latest volley a few days ago (which you can read here), he did some in-depth time tests on various methods of splitting a delimited string…. the XML approach using XPath and the .nodes() and .value() methods, the permanent Numbers table approach, and the approach using a TVF with a virtual table of Numbers.
His verdict was that the XML approach was a biiiiig disappointment when handling a comma-delimited string of many items, in one case performing over 200 times slower (!!) than the Numbers table approaches. Like Adam, I was a bit crestfallen. The XML method is the usual method suggested on the T-SQL forums for doing this sort of thing.
It is literally shocking how badly it performs.
Take a look at the following example, in which I create a variable that contains a comma-delimited list of 5000 items. Then it uses the XML approach to shred the list into a result set of 5000 rows:
/*The task that we’re asking SQL Server to do is not that complicated, but do you want to guess how long this takes to fully execute on my system? Are you sitting down?
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)
select x.i.value('.','varchar(3)')
from (select XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)) a
cross apply XMLList.nodes('i') x(i)
It takes over 6 minutes!
Really disappointing…
I had a lot of trouble accepting this. After doing several experiments, I finally figured out what was going on.
The problem lies in the derived table in the FROM clause, where we create a derived XML column via the CAST function.
It seems we are not really creating a complete column or value at all. Apparently we are inadvertently creating an XML stream, which sends small chunks of data to the .nodes() method and then the chunks get handed over to the .value() method to come up with our result… one chunk at a time.
On my system, when I execute the above code, I see nothing at all happen for about almost 50 seconds. Then the first 676 rows get spit out into the results window. After another 50 seconds or so, another similar-sized chunk of rows get spit out. This happens like clockwork until it spits out the final 5000th row over 6 minutes after I started the query.
You may recall in a previous post of mine where I said that an XML stream is processed in 2033-byte chunks. Well, 676 rows of 3 bytes (‘ABC’) each equals 2028 bytes. (I’m assuming that there’s a little tiny bit of overhead involved and that’s why we can’t squeeze in the 677th row). My comma-delimited string consists of 5000 items at 3 bytes each, coming to 15000 bytes. Divide that by 2033 bytes and you get 7.38, which is the number of chunks of data that get processed. Multiply that by 50 seconds per chunk, and you get… voila… 6.15 minutes.
Well, this must be all really exciting for you math nerds out there, but how does it help us?
It does help to explain what’s going on, and it does show us that streaming the XML is reeeeeaaalllly sloooooow.
So is there hope for the XML method?
YES! YES! YES!
This slow streaming only occurs when we are creating the XML to be processed via a derived table or CTE. What If we instead transfer our comma-delimited string into an actual XML variable and then process that variable, like so?:
/*Believe it or not, this takes only 0.174 seconds!
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)
declare @XMLList xml
set @XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)
select x.i.value('.','varchar(3)')
from @XMLList.nodes('i') x(i)
In comparing the two pieces of code outlined in this article, here are the results:
/* Derived XML
Table Variable Percent
Measurement Method Method Improvement
--------------------------------------------
CPU 386,172 125 2989%
Duration(ms) 368,747 174 2019%
*/
So I’m happy to report that all is not lost for the XML method. It’s still a contender in the list-splitting competition, giving the Numbers table methods a run for their money… as long as it is implemented in a certain way.Back to you, Adam…
UPDATE ON Jan06,2010: Please read the follow-up to this article in which I talk about some additional methods to speed the process up even further.
Great followup Brad!!! This a very interesting developement. Did you capture the results via profiler or statistics time? Also, did you flush the cache between runs, just to make there was no query play reuse?
ReplyDeleteIf this method can put the CPU and duration in the ballpark of the numbers table solution, I will gladly go back to recommending it :^) . When I get some time I will run this test along with my tests from my blog post, to see how it compares; however, I have a feeling you may have done that already.
Again great job!
Those were profiler statistics, and, yes, I did the usual two DBCC commands between each run, and I set the query options to "Discard results".
ReplyDeleteI was going to run the time tests on all methods, but I was too impatient and excited about getting this new finding up to the blog.
--Brad
Brad,
ReplyDeleteI just did the tests and the results rival those of the numbers table both inline and permanent. This is a great find and definitely good knowledge! XML is back baby!!!!!!!!!!!!!!!
Good findings gentle man. and great work and follow-up to both of you!
ReplyDeleteAwesome post. *and* I have a new favorite method of stressing my cpu.
ReplyDeleteSee, it helps to take things personally. You put it more effort that way. :)
ReplyDeleteBradam, this is great stuff.
And without a variable....
ReplyDeletedeclare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)
select x.i.value('.','varchar(3)')
from (select cast(''+replace(@CommaList,',','')+'' as xml).query('.') d) XMLList
cross apply XMLList.d.nodes('i') x(i)
...works quickly too
Hi Brad,
ReplyDeleteI have a general comment for your blogs.
1. Please allow an easy search functionality on your website
2. Please have a list of your blogs with titles somewhere in each blog page
Say, I need to find your blog now and I'm having troubles from this page finding it.
Thanks in advance.
Hi Naomi...
ReplyDeleteIn the upper right-hand corner of each page of this blog is a Google search box to find posts in this blog.
Underneath my profile on the right-hand side of each page is a list of keywords that will take you to blog entries related to that word.
And underneath that is an archive of all blog posts by date (with the blog titles).
I think among all those tools you should be able to find anything you need.
--Brad
This comment has been removed by a blog administrator.
ReplyDeleteI have always enjoyed studying SQL at the university. I was the student who pay for research but did all the technical tasks myself. It was difficult for me to concentrate on my homework in the humanities, but technical disciplines were my love.
ReplyDeleteWriting code is a pretty tricky task. As for me, it is much more difficult to write code than to write an article like this one quite competently. So i respect you for doing this and for writing articles like this one https://www.iiste.org/tag/domyessay/.
ReplyDeleteProgramming is pretty difficult. Not everyone can learn to program, just like not everyone can write an article like excellent article about essay writing service.
ReplyDeleteProgramming this task is not for everyone, you need to have a technical mindset, analytical thinking. Not everyone can learn to program. Just like not everyone can learn how to write such articles as article with tips to writing a unique essay fast write essay quickly, or at least write essay somehow.
ReplyDeleteI was able to find good info from your blog articles.
ReplyDeleteSee, it helps to take things personally.
ReplyDeleteCOVID-19 and Hair Transplantation
Tennis match not a contest, yet rather a well disposed rally, thumping the subject back and forward, practicing our abilities for increased efficiency and productivity and There are different distributed computing models similar to PaaS. To pick the right model, understanding the contrast between Platform as a Service versus Software as a Service and Infrastructure as a Service versus Platform as a Service is important.
ReplyDeleteTennis match not a contest, yet rather a well disposed rally, thumping the subject back and forward, practicing our abilities for integration platform as a service vendors and There are different distributed computing models similar to PaaS. To pick the right model, understanding the contrast between Platform as a Service versus Software as a Service and Infrastructure as a Service versus Platform as a Service is important.
ReplyDeleteGood day. Software is the thing that makes the life of your company much easier. Programs created with the help of dynamics ax service management help organize and control various aspects of your business. You can read more about solutions. I think in the article is some information you need.
ReplyDeleteMy professor showed me this post they said this is an amazing post. This is about the Delimited String Tennis that connects the world historically. I also liked this post. From here I got verb finder site information. Then I go on that site and saw this is the site Where I Can Correct My Sentence Online.
ReplyDeleteThe past continuous is a form of the past tense that expresses an ongoing action or event at some point in the past. This type of past tense is also called progressive past tense because of the verb progress in the past. It uses the past tense of proofread tool to form sentences in past continuous tense or other forms of verbs like "was and were" and in the present participle form of the verb.
ReplyDeleteAs you are aware, RPA enables you to automate several repetitive operations, which will increase productivity, but using RPA calls for a high level of expertise. Use the high-quality hiring of rpa development teams for your firm to get the full benefits of RPA. Learn more about this information right now!
ReplyDeleteColon and semicolon are sometimes confused with each other but they are not the same while there is some similarity. The main function of the colon is to introduce whatever follows it such as a list, a statement, or an explanation that does not use a semicolon. The punctuation fixer used for this makes it fairly easy to learn how to use the colon. Grammar rules related to colons are concise, making it easy to check colons to ensure proper use.
ReplyDeleteWhen I used to write essays in English, I used to make a lot of spelling mistakes. Then one of my elder brother told me that you have to correct your mistakes. He told me about the modal sentence checker website. He said that you will get a lot of benefits from this website. Your Text Will Be Flawless after You Run It through Our English Sentence Checker.
ReplyDeleteWe know how frustrating it is to deal with adverb-related errors, and with our adverb identifier tool, you can make your writing free from all types of errors. and find adverb in sentence online We offer the best services like correction, fact-checking, and re-writing that help you achieve your objectives.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteWhat a unique and fun concept with delimited string tennis! It’s always refreshing to see innovative ideas like this. Thanks for sharing such an entertaining post—I’ll definitely keep this in mind for some weekend fun! Tony's Digital Coupon
ReplyDelete