For example, take a look at the Northwind database. There’s a Categories table and a Products table. We’d like to produce a list of all Categories and, for each Category, produce a column containing a comma-delimited (and sorted) list of ProductID’s that are discontinued.
The standard answer that we give to these kinds of questions (providing that the original poster of the question is using SQL2005 or beyond) is something like this:
select CategoryNameMost times the poster will come back and say, “Er… uh… Thanks.” But occasionally someone will say, “Huh? What’s with all that XML stuff? Could you explain what the heck that complicated mess o’ code is doing please?”
,DiscontinuedIDs=
stuff(
(
select ','+convert(varchar,ProductID)
from Products
where CategoryID=Categories.CategoryID
and Discontinued=1
order by ProductID
for xml path('')
)
,1,1,'')
from Categories
order by CategoryName
/*
CategoryName DiscontinuedIDs
--------------- ---------------
Beverages 24
Condiments 5
Confections NULL
Dairy Products NULL
Grains/Cereals 42
Meat/Poultry 9,17,29,53
Produce 28
Seafood NULL
*/
I know when I first started studying T-SQL and started frequenting the forums and saw this kind of answer, I felt kind of intimidated and figured that this stuff was more complex than I had thought.
But it all becomes clear if you go through how this works step by step.
For illustration purposes, let’s create a temporary table #Stooges and populate it:
create table #Stooges (ID int, Name varchar(10))Now if we do a normal query on this table, we get rows and columns:
insert #Stooges select 1,'Moe'
union all select 2,'Larry'
union all select 3,'Curly'
select * from #StoogesBut if we add a FOR XML clause to the query, we get an XML representation of the data:
/*
ID Name
--- ------
1 Moe
2 Larry
3 Curly
*/
select * from #Stooges for xml pathIn reality, it did not return 3 rows of data… I just presented it that way for readability. In fact, it didn’t produce any rows or columns… it actually produced an XML stream. Any query that has the FOR XML clause (without the ,TYPE directive) produces an XML stream. The data is streamed out in (2033-character) chunks as the query is executed, as you can see below, where I query the 120,000-record SalesOrderDetail table in AdventureWorks. The first part of the XML stream is plopped into the SSMS Results window almost immediately, and the query is still grinding away to finish up the rest of the stream.
/*
XML_F52E2B61-18A1-11d1-B105-00805F49916B
----------------------------------------
<row><ID>1</ID><Name>Moe</Name></row>
<row><ID>2</ID><Name>Larry</Name></row>
<row><ID>3</ID><Name>Curly</Name></row>
*/
(If you want to read more about how the XML streams are created, read this blog entry from Eugene Kogan of Microsoft).
In case you’re curious about where the mysterious F52E2B61-18A1-11d1-B105-00805F49916B identifier name comes from, see the snapshot of the Registry Editor below.
It’s the GUID associated with the IXMLDocument Interface that’s part of the Type Library associated with the MSXML DLL files installed into the Windows System Directory. (If that last sentence sounded like a lot of gibberish to you, I can relate… Don’t worry… It’s not important… As far as I’m concerned, we can pretend that the identifier name came from little magic elves sprinkling pixie dust on pretty unicorns… It makes about as much sense and is easier to understand and relate to).
Anyway, back to the subject at hand…
If you use FOR XML in a scalar subquery it will produce an NVARCHAR(MAX) column, unless you use the ,TYPE directive, which will produce a true XML column. You can see this below, where I do FOR XML subqueries with and without the ,TYPE directive and I put the results INTO a temporary table. If we look at that temporary table’s structure (via INFORMATION_SCHEMA.COLUMNS), we can see the datatypes that got created:
select ForXmlPath=(select *So, with this knowledge, we can perform any kind of string functions on a FOR XML (non ,TYPE) subquery that we like.
from #Stooges
for xml path)
,ForXmlPathType=(select *
from #Stooges
for xml path,type)
into #ThrowMeAway
select Table_Name
,Column_Name
,Data_Type
,Character_Maximum_Length
from tempdb.information_schema.columns
where Table_Name like '#ThrowMeAway%'
/*
Table_Name Column_Name Data_Type Character_Maximum_Length
------------------------------- -------------- --------- ------------------------
#ThrowMeAway__...__000000000018 ForXmlPath nvarchar -1
#ThrowMeAway__...__000000000018 ForXmlPathType xml -1
*/
Note that when we do a FOR XML PATH without specifying the actual path, SQL Server assumes a path of ‘row’. (Again, for clarification, the result below is a single row, single column string value… I only show it as 3 lines for readability):
select Info=(select NameIf we specify a path of ‘Stooge’ and change the name of the column to StoogeName, we get the following (again, showing the string column as 3 lines for readability):
from #Stooges
for xml path)
/*
Info
-----------------------------
<row><Name>Moe</Name></row>
<row><Name>Larry</Name></row>
<row><Name>Curly</Name></row>
*/
select Info=(select StoogeName=NameNow, note that we can choose to have no name (and therefore no tag in the XML) by either making the column into an expression or by giving the column a deliberate name of [*]:
from #Stooges
for xml path('Stooge'))
/*
Info
-----------------------------------------------
<Stooge><StoogeName>Moe</StoogeName></Stooge>
<Stooge><StoogeName>Larry</StoogeName></Stooge>
<Stooge><StoogeName>Curly</StoogeName></Stooge>
*/
select Info=(select ''+NameWe can also choose to specify an empty PATH:
from #Stooges
for xml path('Stooge'))
/*
Info
----------------------------------------------------------------
<Stooge>Moe</Stooge><Stooge>Larry</Stooge><Stooge>Curly</Stooge>
*/
select Info=(select [*]=Name
from #Stooges
for xml path('Stooge'))
/*
Info
----------------------------------------------------------------
<Stooge>Moe</Stooge><Stooge>Larry</Stooge><Stooge>Curly</Stooge>
*/
select Info=(select [*]=NameNow you can probably see where this is going. Let’s put commas in front of those names:
from #Stooges
for xml path(''))
/*
Info
-------------
MoeLarryCurly
*/
select Info=(select ','+NameAnd now we can use the STUFF function to substitute that first comma with an empty string:
from #Stooges
for xml path(''))
/*
Info
----------------
,Moe,Larry,Curly
*/
select Info=stuff(This is great! But WAIT! We’re not done.
(
select ','+Name
from #Stooges
for xml path('')
)
,1,1,'')
/*
Info
---------------
Moe,Larry,Curly
*/
Let’s say some smart-aleck (probably from California) with a somewhat esoteric name of X&Y><Z&T decides to join the gang:
insert #Stooges select 4,'X&Y><Z&T'What happens now?
select Info=stuff(Hmmm… That’s the problem with doing these kinds of tricks by creating XML output… certain characters (like ampersand, less-than, and greater-than) are encoded in a special way, as you can see below:
(
select ','+Name
from #Stooges
for xml path('')
)
,1,1,'')
/*
Info
--------------------------------------
Moe,Larry,Curly,X&Y><Z&T
*/
select Info=(select 'X&Y' for xml path('test'))The FOR XML clause encodes those special characters as it constructs the XML representation of the data. In order for us to UN-encode those characters, we have to work backwards… in other words, extract the data back OUT of the XML representation of the data. We do this through the .value() method. But since this method only works on a true XML datatype, we have to introduce the ,TYPE directive into our query to force the subquery to produce an XML datatype:
/*
Info
--------------------
<test>X&Y</test>
*/
select Info=(select 'X&Y' for xml path('test'),type).value('.','varchar(10)')The .value() method takes two parameters. The first parameter describes the specific “piece” of the XML data that we want to extract. In our case, we just passed it '.', which indicates that we want to extract all the data from the XML as a whole. And, as you noticed, it only pulled out the data without the tags (<test></test>). The second parameter indicates what kind of datatype we want to convert it to as we extract it.
/*
Info
----
X&Y
*/
So, incorporating the ,TYPE directive and the .value() method into our #Stooges list, we finally get the expected result:
select Info=stuff(If you know for sure that the list you’re creating does NOT have any of these special characters, like the query at the beginning of this article which only constructs lists of numbers, then you don’t need to incorporate the ,TYPE directive and .value() method. It will speed things up a bit if you leave those out.
(
select ','+Name
from #Stooges
for xml path(''),type
).value('.','varchar(max)')
,1,1,'')
/*
Info
------------------------
Moe,Larry,Curly,X&Y><Z&T
*/
Now we can put together lists of practically anything. For example, below is a query that pulls out all the orders for the CustomerID of OCEAN from the Northwind database, and shows a list of all the products in those orders (sorted by ProductID), with the quantity and price paid.
select OrderIDYou can create lists within lists also, as in the example below, which lists all Customers and, for each Customer, lists all Categories (in order of the CategoryName) of Products that he purchased, and, for each Category, lists all ProductIDs (in numerical order) of that Category that were purchased:
,ProductList=
stuff(
(
select '; '+convert(varchar,p.ProductID)
+'-'+p.ProductName
+' ('+convert(varchar,d.Quantity)
+'@$'+convert(varchar,d.UnitPrice)+')'
from [Order Details] d
join Products p on d.ProductId=p.ProductID
where d.OrderID=Orders.OrderID
order by p.ProductID
for xml path(''),type
).value('.','nvarchar(max)')
,1,2,'')
from Orders
where CustomerID='OCEAN'
order by OrderID
/*
OrderID ProductList
------- -------------------------------------------------------------------------
10409 14-Tofu (12@$18.60); 21-Sir Rodney's Scones (12@$8.00)
10531 59-Raclette Courdavault (2@$55.00)
10898 13-Konbu (5@$6.00)
10958 5-Chef Anton's Gumbo Mix (20@$21.35); 7-Uncle Bob's Organic Dried Pear...
10986 11-Queso Cabrales (30@$21.00); 20-Sir Rodney's Marmalade (15@$81.00); ...
*/
with CustsCatgsProds asIf you’ve read some of my past blog posts, you know I’m a big promoter of the CROSS APPLY operator to improve readability of a query. For example, the very first query in this article could be re-written as follows:
(
select o.CustomerID
,p.CategoryID
,p.ProductID
from Orders o
join [Order Details] d on o.OrderID=d.OrderID
join Products p on d.ProductID=p.ProductID
join Categories c on p.CategoryID=c.CategoryID
group by o.CustomerID
,p.CategoryID
,p.ProductID
)
select CustomerID
,CategorizedProductList=
stuff(
(
select '; '+CategoryName+':'
+stuff(
(
select ','+convert(varchar,ProductID)
from CustsCatgsProds
where CustomerID=Customers.CustomerID
and CategoryID=CCP.CategoryID
order by ProductID
for xml path('')
)
,1,1,'')
from (select distinct CategoryID
from CustsCatgsProds
where CustomerID=Customers.CustomerID) CCP
join Categories on CCP.CategoryID=Categories.CategoryID
order by CategoryName
for xml path(''),type
).value('.','nvarchar(max)')
,1,2,'')
from Customers
order by CustomerID
/*
CustomerID CategorizedProductList
---------- ----------------------------------------------------------------
ALFKI Beverages:39,76; Condiments:3,6,63,77; Dairy Products:59,71; ...
ANATR Beverages:70; Confections:19; Dairy Products:11,32,60,69,72; ...
ANTON Beverages:2,34,43,75; Condiments:66; Confections:26,48; Dairy...
...
WHITC Beverages:2,34,35,38,75; Condiments:4,8,44,61,63,77; Confecti...
WILMK Beverages:1,24,43,76; Condiments:8,61,65; Confections:16,19,2...
WOLZA Beverages:1,2,24,35,75; Condiments:61; Confections:62,68; Dai...
*/
select CategoryNameIf you do use the CROSS APPLY approach, be very careful in checking the performance and the plan of the query. Adam Haines wrote a terrific blog post that demonstrated that the CROSS APPLY approach to constructing lists can yield terrible results under certain circumstances because of the way that SQL Server performs the query. You can usually re-formulate the query to eliminate that poor performance.
,DiscontinuedIDs
from Categories
cross apply (select DiscIDString=(select ','+convert(varchar,ProductID)
from Products
where CategoryID=Categories.CategoryID
and Discontinued=1
order by ProductID
for xml path(''))) F1
cross apply (select DiscontinuedIDs=stuff(DiscIDString,1,1,'')) F2
order by CategoryName
/*
CategoryName DiscontinuedIDs
--------------- ---------------
Beverages 24
Condiments 5
Confections NULL
Dairy Products NULL
Grains/Cereals 42
Meat/Poultry 9,17,29,53
Produce 28
Seafood NULL
*/
I hope this article was helpful in explaining how the FOR XML PATH method works in creating lists. In my next blog post, I'll show how we can tear those lists apart (or shred them) back into rows.
Brilliant article Brad. You did a great job of explaining the inner details of FOR XML (and I know you just scratched the surface), congrats.
ReplyDeleteAlthough I have to confess the fairy dust theory was working pretty well for me until now :-)
Thanx Brad.
ReplyDeleteI never played with XML here, even though i have seen it a lot. I bookmarked this for reference when i do next need it.
In some cases, you have to deal with ascii values less than 32 (space).
ReplyDeleteThis is one way to work around the problem.
DECLARE @Sample TABLE
(
ID INT,
Data VARCHAR(100)
)
INSERT @Sample
VALUES (1, 'Peso & Performance SQL'),
(1, 'MVP'),
(2, 'Need help ->' + CHAR(2) + '<- '),
(2, 'With XML string concatenation ?')
;WITH cteSource(ID, Content)
AS (
SELECT i.ID,
f.Content.value('.', 'NVARCHAR(MAX)')
FROM (
SELECT ID
FROM @Sample
GROUP BY ID
) AS i
CROSS APPLY (
SELECT CAST(', ' + w.Data AS VARBINARY(MAX))
FROM @Sample AS w
WHERE w.ID = i.ID
FOR XML PATH(''),
TYPE
) AS f(Content)
)
SELECT ID,
STUFF(Content, 1, 2, '') AS Content
FROM (
SELECT ID,
CAST(N'' AS XML).value('xs:base64Binary(sql:column("Content"))', 'VARBINARY(MAX)') AS Content
FROM cteSource
) AS d
Thanks for the extra info, Peso!
ReplyDeleteHi Brad,
ReplyDeleteNice article. I have a problem with '&' character.ProductID contains '&' , for example "pen&pencil", when i tried to update this as "pen&paper" from front end into database.But database doesn't accept this changes..what would be the problem?
Thanks in advance
@Anonymous:
ReplyDeleteWhat is the front end? How are you doing the update? What is the exact command? In what way does the database not accept the change? Is there an error? Or does the ProductID come out differently from 'pen&paper'?
--Brad
Dear Brad,
ReplyDeleteIn my query I am selecting "COUNT(*)" as comma separated
The problem is when COUNT(*) returns 0 then XML PATH skips it.
I am not getting why?
any idea?
Can you be more specific? Can you post some code where you're trying to use COUNT(*)?
ReplyDeleteIf you like, you can send me an email at brad at stockciphering dot com.
Peso 's solution for lower ascii values has a problem with text entries with an even number of characters.
ReplyDeleteJust try:
...
INSERT @Sample
VALUES (1, 'Peso & Performance SQL'),
(1, 'MVP'),
(2, 'Need help ->' + CHAR(2) + '<- '),
(2, 'With XML string concatenation ?'),
(2, 'Add new data!'),
(2, 'ABC'),
(2, 'ABCD'),
(2, 'ABCD with even no. of chars is ignored')
Any solutions?
Excellent tutorial Brad!! Having worked with Sybase's awesomely simple 'list' function, I was struggling to understand XPATH queries. Your tutorial has finally helped me figure out what XPath is all about! Thanks!!
ReplyDeleteKarthiek... Thanks so much for the positive feedback... I'm so glad it helped you out!
ReplyDelete--Brad
Frans: I'm afraid I don't have a solution to get around the problem that you posed with Peso's code. I'll have to look more closely at it.
ReplyDelete--Brad
Brad,
Deletewere you able to look more closely at Fran's code.
I've spent the whole day trying to apply this to a query I'm working on. I get the XML Path and the concatenated results. The problem is that every row has the same values. I know it's wrong, I just can't figure out how/where your code applies a "break" so that each row/Order ID has a different set of Products.
ReplyDeleteThanks.
Ooops - In case it helps, here's my code:
ReplyDeleteSELECT
CONVERT(varchar(10), CONVERT(smalldatetime, CAST(sched.next_run_date as varchar), 120), 120)
AS next_run_date_time
,jobname=
stuff(
(
select '; '+job.[name]
FROM msdb.dbo.sysjobs job
JOIN msdb.dbo.sysjobschedules sched
ON sched.job_id = job.job_id
WHERE job.enabled = 1
AND job.category_id = 103
order by job.[name]
for xml path ('')
)
,1,2,'')
FROM msdb.dbo.sysjobschedules sched
WHERE sched.next_run_date > 0
order by next_run_date_time
When I follow the syntax for the sub-query join:
ReplyDeleteselect *
FROM msdb.dbo.sysjobs job
WHERE job.job_id=msdb.dbo.sysjobschedules.job_id
I get an error:
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "msdb.dbo.sysjobschedules.job_id" could not be bound.
Is this syntax only valid in certain version of SQL Server? I have SQL Server 2008 R2 installed.
Thanks again
@Anonymous:
ReplyDeleteI remember working with Frans' code for a while, but I couldn't get it to work. Peso would be the one to ask.
--Brad
@Opher:
ReplyDeleteSorry I'm late in responding... I didn't see your posts until today.
Here is how I would accomplish what you're trying to do:
select nextrundate=convert(varchar(10),convert(smalldatetime,convert(varchar(10),x.next_run_date)),120)
,joblist=stuff((select '; '+job.name
from msdb.dbo.sysjobschedules sched
join msdb.dbo.sysjobs job on sched.job_id=job.job_id
where sched.next_run_date=x.next_run_date
and job.enabled=1
and job.category_id=103
order by job.name
for xml path(''))
,1,2,'')
from (select distinct next_run_date
from msdb.dbo.sysjobschedules) x
--Brad
Brad! you are one of the best, God bless you.
ReplyDeleteAwesome job - thank you very much for the post.
ReplyDeletethanks a lot sir
ReplyDeleteOutstanding!! I googled for a way to concatenate columnar data, and found the basic solution using FOR XML and STUFF. But I was totally in the dark as to how it worked. Your post has cleared the fog! Thanks a million.
ReplyDeleteThank you for the great feedback, John! I'm glad the post was helpful. You made my day!
ReplyDeleteGreat explanation. I had been able to fumble thru getting for xml path to work before, but did not really understand what it was doing. This makes sense, thanks!
ReplyDeleteCool... thanks for the feedback!
ReplyDeletebrillant piece of information, I had come to know about your web-page from my friend hardkik, chennai,i have read atleast 9 posts of yours by now, and let me tell you, your webpage gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanx a million once again, Regards, Single Row Function in sql
ReplyDeleteattractive piece of information, I had come to know about your blog from my friend arjun, ahmedabad,i have read atleast eleven posts of yours by now, and let me tell you, your website gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanks a lot once again, Regards, Single Row Function
ReplyDeleteIt is outstanding article.I havent seen article like this.
ReplyDeleteThanks for the feedback! Glad you found it helpful!
ReplyDeleteGreat post. Like many other readers I found other solutions to this problem online but none as clear.
ReplyDeleteCheers, Max
It doesn't matter how old this post is, I have to admit: firstly that fairy dust is my preferred method, but when the fairies are on holiday I will resort to understanding, and you have helped me with this in such a great way.
ReplyDeleteI am very new to SQL but I enjoy a good challenge; having encountered STUFF & FOR XML PATH some times in team scripts, but not fully needing to use it I left it to one or two other individuals to handle. Now I have found not only a reason to need it but a time where no one has been available to assign it to, but what's best is I have understood it from your post here, and been able to apply it. I will be using it again and again, when necessary of course, and I want to thank you for that.
I enjoy learning and you have made my day on this one. =)
Cheers, Mike P.
@MikeP:
ReplyDeleteThank you so much for your comments. Like you, I was confused by all this STUFF and XML PATH stuff when I saw code posted on the forums and I set out to understand what's going on. This post was the result... I wanted to make sure others could grasp it as well. I'm glad it did the job!
Dear Brad,
ReplyDeleteyour code works so great, i really really thank you for that!
I helped me a lot.
Just a short question to improve my query:
What would be the best way to get just the first 5 entries comma separated?
Maybe just like you have 10 Stooges and you just need to now the first 5 one's :)
ahhh sorry Brad, i'm so dumb.
DeleteI found it b myself:
(select top 5 '|' + ....
THANK YOU!!
Brad,
ReplyDeleteBloody 'ell!
Took a while to find something that I could use without pivot and then a short time to apply your workings to what I was looking for.
Cheers!
Peter
@Peter:
ReplyDeleteThanks for the feedback... So glad that this was a help to you!
really enjoyed the article
ReplyDeleteIT nearshoring is a whole industry that has been appreciated by many companies from all over the world. And by clicking on the link besuche die Website you can find out more information about him in Poland, because employees from this country will give a tangible increase to your company with the help of first-class skills.
ReplyDelete