SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server(): Spotlight on UNPIVOT, Part 2

In my last blog entry, we looked at the UNPIVOT command and how it works in unpivoting columns into rows.

We learned that T-SQL simply translates the UNPIVOT operator into an OUTER APPLY, coupled with a filter to get rid of NULL values. With that knowledge, we were able to unpivot data ourselves using an APPLY operator directly. This gave us more flexibility… We could include NULLs if we wanted to and we could very easily unpivot multiple sets of columns.

In this entry, we’ll go over some scenarios for when unpivoting data is helpful (more likely essential), and we’ll present many examples.

Before we begin, though, I have to say one thing. Most of these examples employ unpivoting because the tables are horribly designed and not normalized. I urge you, in designing your tables, to not emulate some of these table structures I am going to present.

Aggregations Over Columns

Let’s say you’re a school teacher and you have given 12 tests (exams) to your students over the course of the semester. Each test score is a percentage value from 0 to 100. And here’s the interesting part… you gave your students the option to skip any test of their choice during the semester, so one of the 12 tests might have a NULL value. Some students might take advantage of that… others (the teacher’s pets no doubt) will take every exam.

Here’s the (poorly-designed) table:

if object_id('tempdb..#Scores','U') is not null drop table #Scores
go
create table #Scores
(
   StudentID int primary key
  ,Test1 int
  ,Test2 int
  ,Test3 int
  ,Test4 int
  ,Test5 int
  ,Test6 int
  ,Test7 int
  ,Test8 int
  ,Test9 int
  ,Test10 int
  ,Test11 int
  ,Test12 int
)
go
insert #Scores select 1,  96,  87,null, 100,  99,  91,  73,  90,  90,  91,  93,  92
     union all select 2,  83,  84,  89,  79,  88,null,  93,  82,  85,  86,  88,  82
     union all select 3, 100, 100,  98,  91, 100,  95,  96,  96,  99, 100,  98, 100
 
select * from #Scores
/*
StudentID Test1 Test2 Test3 Test4 Test5 Test6 Test7 Test8 Test9 Test10 Test11 Test12
--------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ------ ------ ------
        1    96    87  NULL   100    99    91    73    90    90     91     93     92
        2    83    84    89    79    88  NULL    93    82    85     86     88     82
        3   100   100    98    91   100    95    96    96    99    100     98    100
*/

Your job is to get an average test score for each student. Well, that seems easy enough. We just add the scores together and divide by 12, right?

Oh wait…

There’s that darn NULL value we have to deal with… some students might have it, and some may not. And we don’t know which of the 12 test scores have the NULL value.

So we have to come up with a really awful query like this:

select StudentID
      ,AverageScore=( coalesce(Test1,0)+coalesce(Test2,0)+coalesce(Test3,0)
                     +coalesce(Test4,0)+coalesce(Test5,0)+coalesce(Test6,0)
                     +coalesce(Test7,0)+coalesce(Test8,0)+coalesce(Test9,0)
                     +coalesce(Test10,0)+coalesce(Test11,0)+coalesce(Test12,0) )
                    * 1.0 /                                                           
                    ( case when Test1 is not null then 1 else 0 end
                     +case when Test2 is not null then 1 else 0 end
                     +case when Test3 is not null then 1 else 0 end
                     +case when Test4 is not null then 1 else 0 end
                     +case when Test5 is not null then 1 else 0 end
                     +case when Test6 is not null then 1 else 0 end
                     +case when Test7 is not null then 1 else 0 end
                     +case when Test8 is not null then 1 else 0 end
                     +case when Test9 is not null then 1 else 0 end
                     +case when Test10 is not null then 1 else 0 end
                     +case when Test11 is not null then 1 else 0 end
                     +case when Test12 is not null then 1 else 0 end )
from #Scores
/*
StudentID    AverageScore
--------- ---------------
        1 91.090909090909
        2 85.363636363636
        3 97.750000000000
*/

You have to admit… That is really ugly.

But it gets uglier…

Even worse, you have to find the minimum and maximum test score for each student. Take a look at this monster (try not to laugh… I dare you), which finds the maximum score:

select StudentID
      ,MaxScore=case 
                  when Test1>=coalesce(Test2,0) and Test1>=coalesce(Test3,0) 
                   and Test1>=coalesce(Test4,0) and Test1>=coalesce(Test5,0)
                   and Test1>=coalesce(Test6,0) and Test1>=coalesce(Test7,0) 
                   and Test1>=coalesce(Test8,0) and Test1>=coalesce(Test9,0) 
                   and Test1>=coalesce(Test10,0) and Test1>=coalesce(Test11,0) 
                   and Test1>=coalesce(Test12,0) then Test1
                  when Test2>=coalesce(Test1,0) and Test2>=coalesce(Test3,0) 
                   and Test2>=coalesce(Test4,0) and Test2>=coalesce(Test5,0)
                   and Test2>=coalesce(Test6,0) and Test2>=coalesce(Test7,0) 
                   and Test2>=coalesce(Test8,0) and Test2>=coalesce(Test9,0) 
                   and Test2>=coalesce(Test10,0) and Test2>=coalesce(Test11,0) 
                   and Test2>=coalesce(Test12,0) then Test2
                  when Test3>=coalesce(Test1,0) and Test3>=coalesce(Test2,0) 
                   and Test3>=coalesce(Test4,0) and Test3>=coalesce(Test5,0) 
                   and Test3>=coalesce(Test6,0) and Test3>=coalesce(Test7,0) 
                   and Test3>=coalesce(Test8,0) and Test3>=coalesce(Test9,0) 
                   and Test3>=coalesce(Test10,0) and Test3>=coalesce(Test11,0) 
                   and Test3>=coalesce(Test12,0) then Test3
                  when Test4>=coalesce(Test1,0) and Test4>=coalesce(Test2,0) 
                   and Test4>=coalesce(Test3,0) and Test4>=coalesce(Test5,0)
                   and Test4>=coalesce(Test6,0) and Test4>=coalesce(Test7,0) 
                   and Test4>=coalesce(Test8,0) and Test4>=coalesce(Test9,0) 
                   and Test4>=coalesce(Test10,0) and Test4>=coalesce(Test11,0) 
                   and Test4>=coalesce(Test12,0) then Test4
                  when Test5>=coalesce(Test1,0) and Test5>=coalesce(Test2,0) 
                   and Test5>=coalesce(Test3,0) and Test5>=coalesce(Test4,0) 
                   and Test5>=coalesce(Test6,0) and Test5>=coalesce(Test7,0) 
                   and Test5>=coalesce(Test8,0) and Test5>=coalesce(Test9,0) 
                   and Test5>=coalesce(Test10,0) and Test5>=coalesce(Test11,0) 
                   and Test5>=coalesce(Test12,0) then Test5
                  when Test6>=coalesce(Test1,0) and Test6>=coalesce(Test2,0) 
                   and Test6>=coalesce(Test3,0) and Test6>=coalesce(Test4,0) 
                   and Test6>=coalesce(Test5,0) and Test6>=coalesce(Test7,0) 
                   and Test6>=coalesce(Test8,0) and Test6>=coalesce(Test9,0) 
                   and Test6>=coalesce(Test10,0) and Test6>=coalesce(Test11,0) 
                   and Test6>=coalesce(Test12,0) then Test6
                  when Test7>=coalesce(Test1,0) and Test7>=coalesce(Test2,0) 
                   and Test7>=coalesce(Test3,0) and Test7>=coalesce(Test4,0) 
                   and Test7>=coalesce(Test5,0) and Test7>=coalesce(Test6,0) 
                   and Test7>=coalesce(Test8,0) and Test7>=coalesce(Test9,0) 
                   and Test7>=coalesce(Test10,0) and Test7>=coalesce(Test11,0) 
                   and Test7>=coalesce(Test12,0) then Test7
                  when Test8>=coalesce(Test1,0) and Test8>=coalesce(Test2,0) 
                   and Test8>=coalesce(Test3,0) and Test8>=coalesce(Test4,0) 
                   and Test8>=coalesce(Test5,0) and Test8>=coalesce(Test6,0) 
                   and Test8>=coalesce(Test7,0) and Test8>=coalesce(Test9,0) 
                   and Test8>=coalesce(Test10,0) and Test8>=coalesce(Test11,0) 
                   and Test8>=coalesce(Test12,0) then Test8
                  when Test9>=coalesce(Test1,0) and Test9>=coalesce(Test2,0) 
                   and Test9>=coalesce(Test3,0) and Test9>=coalesce(Test4,0) 
                   and Test9>=coalesce(Test5,0) and Test9>=coalesce(Test6,0) 
                   and Test9>=coalesce(Test7,0) and Test9>=coalesce(Test8,0) 
                   and Test9>=coalesce(Test10,0) and Test9>=coalesce(Test11,0) 
                   and Test9>=coalesce(Test12,0) then Test9
                  when Test10>=coalesce(Test1,0) and Test10>=coalesce(Test2,0) 
                   and Test10>=coalesce(Test3,0) and Test10>=coalesce(Test4,0) 
                   and Test10>=coalesce(Test5,0) and Test10>=coalesce(Test6,0) 
                   and Test10>=coalesce(Test7,0) and Test10>=coalesce(Test8,0)
                   and Test10>=coalesce(Test9,0) and Test10>=coalesce(Test11,0) 
                   and Test10>=coalesce(Test12,0) then Test10
                  when Test11>=coalesce(Test1,0) and Test11>=coalesce(Test2,0) 
                   and Test11>=coalesce(Test3,0) and Test11>=coalesce(Test4,0) 
                   and Test11>=coalesce(Test5,0) and Test11>=coalesce(Test6,0) 
                   and Test11>=coalesce(Test7,0) and Test11>=coalesce(Test8,0) 
                   and Test11>=coalesce(Test9,0) and Test11>=coalesce(Test10,0) 
                   and Test11>=coalesce(Test12,0) then Test11
                  when Test12>=coalesce(Test1,0) and Test12>=coalesce(Test2,0) 
                   and Test12>=coalesce(Test3,0) and Test12>=coalesce(Test4,0) 
                   and Test12>=coalesce(Test5,0) and Test12>=coalesce(Test6,0) 
                   and Test12>=coalesce(Test7,0) and Test12>=coalesce(Test8,0) 
                   and Test12>=coalesce(Test9,0) and Test12>=coalesce(Test10,0) 
                   and Test12>=coalesce(Test11,0) then Test12
                end
from #Scores
/*
StudentID MaxScore
--------- --------
        1      100
        2       93
        3      100
*/

(Sheesh… The things I do to demonstrate stupidity… I can’t believe I actually typed that whole thing out… I must be insane).

Well, the UNPIVOT clause can come to our rescue. It can unpivot the columns into rows and we can use T-SQL’s ability to do aggregations. So we can easily calculate the Average, Minimum, Maximum, and even the Standard Deviation of those test scores:

select StudentID
      ,AverageScore=avg(TestScore*1.0)
      ,MinScore=min(TestScore)
      ,MaxScore=max(TestScore)
      ,StdDev=stdev(TestScore)
from #Scores
unpivot (TestScore for Descript in (Test1,Test2,Test3,Test4,Test5,Test6
                                   ,Test7,Test8,Test9,Test10,Test11,Test12)) P
group by StudentID
/*
StudentID AverageScore MinScore MaxScore           StdDev
--------- ------------ -------- -------- ----------------
        1    91.090909       73      100 7.18963900977711
        2    85.363636       79       93 3.95658254741963
        3    97.750000       91      100 2.80016233295663
*/

Now isn’t that a lot easier by a factor of about a thousand?

Transpositions

This section is just for fun, really… however it might give you some ideas for other things.

Let’s take our table of test scores and turn the columns into rows and rows into columns… in other words, we’ll unpivot and then re-pivot.

;with UnpivotedScores as
(
  select StudentColumn='Student'+convert(varchar(10),StudentID)
        ,TestScore
        ,TestID=convert(int,substring(TestColumnName,5,len(TestColumnName)))
  from #Scores
  unpivot (TestScore for TestColumnName in (Test1,Test2,Test3,Test4,Test5,Test6
                                           ,Test7,Test8,Test9,Test10,Test11,Test12)) U
)
select * 
from UnpivotedScores
pivot (max(TestScore) for StudentColumn in (Student1,Student2,Student3)) P
/*
TestID Student1 Student2 Student3
------ -------- -------- --------
     1       96       83      100
     2       87       84      100
     3     NULL       89       98
     4      100       79       91
     5       99       88      100
     6       91     NULL       95
     7       73       93       96
     8       90       82       96
     9       90       85       99
    10       91       86      100
    11       93       88       98
    12       92       82      100
*/

So in this query, after unpivoting the data, I took the StudentID’s of 1 and 2 and 3 and created a column called StudentColumn containing the strings ‘Student1’, ‘Student2’, and ‘Student3’. I also created a column called TestID, extracting its value out of TestColumnName, a column created by the UNPIVOT. Finally, I PIVOTed the data across the three students.

Of course, I was only able to do this because I knew that I had only 3 studentsin my data. If I had more, then I would have to list more student columns in the PIVOT.

Here’s another fun and crazy exercise.

Create a table of 20,000 rows, with each row containing 8 random values from 1 to 999999.

if object_id('tempdb..#BunchesOfNums','U') is not null drop table #BunchesOfNums
go
create table #BunchesOfNums 
(
   Num1 int
  ,Num2 int
  ,Num3 int
  ,Num4 int
  ,Num5 int
  ,Num6 int
  ,Num7 int
  ,Num8 int
)
go
set nocount on
insert #BunchesOfNums 
select abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1 
go 20000
 
select * from #BunchesOfNums
/*
  Num1   Num2   Num3   Num4   Num5   Num6   Num7   Num8
------ ------ ------ ------ ------ ------ ------ ------
430583 396264 325292 396579 294463 540881 827428 332591
819493 634199 705003 711098 387911 146887 931888 705560
967582  19768 391719 243203 716007 378208 471666 908523
206660 382012 753810 289947 725329 309939 268837 770277
651409 353534  54753 799079 317000 589131  77782 136318
  4796 481041 870133 282436 243489 890761 307494 510633
682449 686105 311957 626728 781477 902128 974246 256178
933380 499938 653796  33348 536858 359010 875253 414892
... And so on for 20,000 rows
*/

Let’s reorganize all of this data, sorting it across the columns (i.e. horizontally):

;with NumbersUnpivot as
(
  select Num
        ,NewRow=(row_number() over (order by Num)-1)/8+1
        ,NewCol=(row_number() over (order by Num)-1)%8+1
  from #BunchesOfNums
  unpivot (Num for NumColumn in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
)
select Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8
from (select Num
            ,NewRow
            ,ColName='Num'+convert(varchar(10),NewCol)
      from NumbersUnpivot) A
pivot (max(Num) for ColName in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
order by NewRow
/*
  Num1   Num2   Num3   Num4   Num5   Num6   Num7   Num8
------ ------ ------ ------ ------ ------ ------ ------
    17     40     63     67     68     71     79     90
    94     97    103    109    120    122    134    140
   166    173    174    177    183    192    194    195
   204    206    207    208    216    220    222    229
   238    247    263    273    277    278    290    300
 . . .
999772 999772 999773 999780 999781 999783 999783 999787
999794 999804 999806 999808 999808 999822 999830 999831
999832 999840 999841 999855 999868 999875 999887 999905
999905 999910 999930 999938 999940 999949 999963 999966
999966 999972 999986 999988 999991 999991 999992 999996
*/

Again, this was an UNPIVOT followed by a PIVOT. The original table was unpivoted, creating a derived table of 1 column (called Num) containing all 160,000 of our numbers. I created new columns called NewRow and NewCol, which were calculated using the ROW_NUMBER() function based on the sorting of the Num column. NewRow was calculated by dividing by 8 and NewCol was calculated by using doing a modulo 8 (i.e. the remainder in dividing by 8). Then I re-PIVOTed the numbers based on these new Row/Column positions.

We could also reorganize the numbers down across the rows (i.e. vertically). This time, we have to use 20,000 (the number of rows) for our divisor. But let’s make it more general and assume we don’t know up front how many rows there are. Let’s use the COUNT() aggregate to figure that number out and use that as the divisor:

;with NumbersUnpivot as
(
  select Num
        ,NewRow=(row_number() over (order by Num)-1)%(count(Num) over () / 8)+1
        ,NewCol=(row_number() over (order by Num)-1)/(count(Num) over () / 8)+1
  from #BunchesOfNums
  unpivot (Num for NumColumn in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
)
select Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8
from (select Num
            ,NewRow
            ,ColName='Num'+convert(varchar(10),NewCol)
      from NumbersUnpivot) A
pivot (max(Num) for ColName in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
order by NewRow
/*
  Num1   Num2   Num3   Num4   Num5   Num6   Num7   Num8
------ ------ ------ ------ ------ ------ ------ ------
    17 125498 250005 374416 499463 623675 749265 875013
    40 125501 250008 374421 499471 623680 749268 875025
    63 125512 250010 374429 499472 623687 749269 875029
    67 125517 250018 374431 499480 623691 749278 875031
    68 125533 250028 374433 499485 623691 749289 875033
 . . .
125479 249986 374402 499427 623659 749216 874991 999988
125480 249987 374409 499433 623661 749242 874994 999991
125484 249999 374411 499446 623662 749243 874997 999991
125488 250001 374413 499453 623663 749251 875003 999992
125493 250003 374416 499455 623675 749254 875004 999996
*/

This certainly works, but the query is not really that efficient. Because of the COUNT() aggregate and the way it’s used, the query employs a whopping 384159 reads and 738 writes, because it needs to spool information to tempdb to do its work.

The following is more efficient and only employs 650 reads and 0 writes… an enormous difference!

;with NumbersUnpivot as
(
  select Num
        ,NewRow=(row_number() over (order by Num)-1)%NumRows+1
        ,NewCol=(row_number() over (order by Num)-1)/NumRows+1
  from #BunchesOfNums
  cross join (select NumRows=count(*) from #BunchesOfNums) C
  unpivot (Num for NumColumn in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
)
select Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8
from (select Num
            ,NewRow
            ,ColName='Num'+convert(varchar(10),NewCol)
      from NumbersUnpivot) A
pivot (max(Num) for ColName in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
order by NewRow
/*
  Num1   Num2   Num3   Num4   Num5   Num6   Num7   Num8
------ ------ ------ ------ ------ ------ ------ ------
    17 125498 250005 374416 499463 623675 749265 875013
    40 125501 250008 374421 499471 623680 749268 875025
    63 125512 250010 374429 499472 623687 749269 875029
    67 125517 250018 374431 499480 623691 749278 875031
    68 125533 250028 374433 499485 623691 749289 875033
 . . .
125479 249986 374402 499427 623659 749216 874991 999988
125480 249987 374409 499433 623661 749242 874994 999991
125484 249999 374411 499446 623662 749243 874997 999991
125488 250001 374413 499453 623663 749251 875003 999992
125493 250003 374416 499455 623675 749254 875004 999996
*/

As you can see, I calculated the number of rows up front (into a column called NumRows) and I CROSS JOINed that with the table itself so that I could use it as the divisor in calculating NewRow and NewCol.

I encourage you to examine the plans of the two queries above to see the differences.

Column Data Repair

How often have you seen a scenario like this?: A table has 3 or 4 or 5 or more columns that contain the same kind of information. An address is a good example. A customer might have 1 or 2 or 3 lines of address and there are 3 columns for that information to be stored.

Another example is credit card information. A table has 5 columns to store up to 5 possible credit card numbers that we have on file for a person.

Again, this is very bad design… the credit card information should be put into a child table of its own, with one row for each credit card. Then many credit cards could be recorded and old ones can be deleted as much as desired.

But in the bad table design with 5 credit card columns, people will just erase outdated credit card numbers over time and not reorganize them and you end up with lots of possible “holes” as in the following example:

use TempDB
go
if object_id('CredCardInfo','U') is not null drop table CredCardInfo
go
create table CredCardInfo
(
   ID int primary key
  ,Card1 varchar(16)
  ,Card2 varchar(16)
  ,Card3 varchar(16)
  ,Card4 varchar(16)
  ,Card5 varchar(16)
)
go
 
insert CredCardInfo values
  (1,'','4898198371982987','5379110787298721','','')
 ,(2,'6798172988177792','','','5798116090379113','4798116444218977')
 ,(3,'','','','','4980926667212798')
 ,(4,'','','6877172983719871','','4079267155432178')
 ,(5,'4076372911736532','6888327238009837','5368200841237897','','')
 
select * from CredCardInfo
/*
ID Card1            Card2            Card3            Card4            Card5
-- ---------------- ---------------- ---------------- ---------------- ----------------
 1                  4898198371982987 5379110787298721                  
 2 6798172988177792                                   5798116090379113 4798116444218977
 3                                                                     4980926667212798
 4                                   6877172983719871                  4079267155432178
 5 4076372911736532 6888327238009837 5368200841237897                  
*/

You can see that the first 4 rows have empty columns where credit card numbers used to be. We want to repair this information so that all the credit card numbers are like the 5th row, where all the card numbers are in the 1st and 2nd and 3rd slots and so on with no “holes”.

So here is a stored procedure that does just that:

use TempDb
go
if object_id('RepairCardInfo','P') is not null drop procedure RepairCardInfo
go
create procedure RepairCardInfo
as
with CardsUnpivot as
(
  select ID
        ,Card
        ,OrigColName
        ,NewColName='Card'
                   +convert(char(1),row_number() over (partition by ID 
                                                       order by OrigColName))
  from CredCardInfo
  unpivot (Card for OrigColName in (Card1,Card2,Card3,Card4,Card5)) P
  where Card<>''
)
,IDsThatNeedChange as
(
  select ID
  from CardsUnpivot
  where OrigColName<>NewColName
)
,NewCardValues as
(
  select ID,Card1,Card2,Card3,Card4,Card5
  from (select ID,Card,NewColName 
        from CardsUnpivot
        where ID in (select ID from IDsThatNeedChange)) A
  pivot (max(Card) for NewColName in (Card1,Card2,Card3,Card4,Card5)) P
)
update C
set Card1=coalesce(n.Card1,'')
   ,Card2=coalesce(n.Card2,'')
   ,Card3=coalesce(n.Card3,'')
   ,Card4=coalesce(n.Card4,'')
   ,Card5=coalesce(n.Card5,'')
from CredCardInfo C
join NewCardValues N on C.ID=N.ID

Like other examples we’ve seen in this article, we UNPIVOT the data and then re-PIVOT it.

Take a closer look at the CardsUnpivot CTE… here’s what it produces:

select ID,Card,OrigColName
      ,NewColName='Card'
                 +convert(char(1),row_number() over (partition by ID 
                                                     order by OrigColName))
from CredCardInfo
unpivot (Card for OrigColName in (Card1,Card2,Card3,Card4,Card5)) P
where Card<>''
/*
ID Card             OrigColName NewColName
-- ---------------- ----------- ----------
 1 4898198371982987 Card2       Card1
 1 5379110787298721 Card3       Card2
 2 6798172988177792 Card1       Card1
 2 5798116090379113 Card4       Card2
 2 4798116444218977 Card5       Card3
 3 4980926667212798 Card5       Card1
 4 6877172983719871 Card3       Card1
 4 4079267155432178 Card5       Card2
 5 4076372911736532 Card1       Card1
 5 6888327238009837 Card2       Card2
 5 5368200841237897 Card3       Card3
*/

Notice that after the UNPIVOTing, we create a NewColName, based on the ROW_NUMBER() ordered by OrigColName (within each ID). And that NewColName is sequenced as ‘Card1’, ‘Card2’, etc for each ID.

The next CTE, IDdsThatNeedChange, finds all IDs where OrigColName and NewColName are different. Note that ID Number 5 has them all matching because its data is already in the format we want, with all the numbers in slot 1 and 2 and 3.

The final NewCardValues CTE rePIVOTs the data back into the 5 credit card columns… but only for those IDs that need the change.

And that CTE is used to UPDATE the original table… updating only those rows that need change.

If we execute this procedure, and then look at the data again, we see that all is repaired:

set nocount off
exec RepairCardInfo
/* (4 row(s) affected) */
  
select * from CredCardInfo
/*
ID Card1            Card2            Card3            Card4            Card5
-- ---------------- ---------------- ---------------- ---------------- ----------------
 1 4898198371982987 5379110787298721                                   
 2 6798172988177792 5798116090379113 4798116444218977                  
 3 4980926667212798                                                    
 4 6877172983719871 4079267155432178                                   
 5 4076372911736532 6888327238009837 5368200841237897                  
*/

Note that only 4 of 5 rows were actually UPDATEd. If we run the procedure again, nothing happens, because all of our data is already repaired:

set nocount off
exec RepairCardInfo
/* (0 row(s) affected) */

Vertical Presentations of Data

Sometimes it’s nice to see data presented in a vertical format rather than in columns.

In AdventureWorks, there’s a view called Sales.vStoreWithDemographics, which JOINs many tables to come up with Contact information, Name and Address information, and Demographic information for Stores in the database.

Let’s say we want to create an inline table-valued function which accepts a @CustomerID parameter to present data for the Main Office of a Customer Store. Since there can potentially be more than one contact for a store, we will use ROW_NUMBER() to only look at the first instance.

If we attempt to use the UNPIVOT operator to do this, we run into trouble:

if object_id('VerticalMainOfficeData','IF') is not null 
  drop function VerticalMainOfficeData
go
create function VerticalMainOfficeData
(
  @CustomerID int
)
returns table
as
return
with BaseData as
(
  select SeqNo=row_number() over (order by ContactType)
        ,Name,AddressLine1,AddressLine2,City,StateProvinceName
        ,PostalCode,CountryRegionName
        ,YearOpened,NumberEmployees,Specialty 
        ,SquareFeet,Brands,AnnualSales
  from AdventureWorks.Sales.vStoreWithDemographics
  where CustomerID=@CustomerID
    and AddressType='Main Office'
)
select Property,Value
from BaseData
unpivot (Value for Property in (Name
                               ,AddressLine1
                               ,AddressLine2
                               ,City
                               ,StateProvinceName 
                               ,PostalCode 
                               ,CountryRegionName 
                               ,YearOpened
                               ,NumberEmployees
                               ,Specialty
                               ,SquareFeet
                               ,Brands
                               ,AnnualSales)) P
where SeqNo=1
/*
Msg 8167, Level 16, State 1, Procedure VerticalMainOfficeData, Line 22
The type of column "AddressLine1" conflicts with the type of other columns 
  specified in the UNPIVOT list.
*/

That’s yet another problem with the UNPIVOT operator. Every single column name specified in the list must be exactly the same datatype and have exactly the same length.

But, as we now know, we can just use an APPLY operator instead of UNPIVOT, and we get truckloads of flexibility, as you can see by this function definition, which returns our information in a nice fancy way:

if object_id('VerticalMainOfficeData','IF') is not null 
  drop function VerticalMainOfficeData
go
create function VerticalMainOfficeData
(
  @CustomerID int
)
returns table
as
return
with BaseData as
(
  select SeqNo=row_number() over (order by ContactType)
        ,Name,AddressLine1,AddressLine2,City,StateProvinceName
        ,PostalCode,CountryRegionName
        ,YearOpened,NumberEmployees,Specialty 
        ,SquareFeet,Brands,AnnualSales
  from AdventureWorks.Sales.vStoreWithDemographics
  where CustomerID=@CustomerID
    and AddressType='Main Office'
)
select Property,Value
from BaseData
cross apply 
   (values ('NAME AND ADDRESS:','')
          ,('  Name',Name)
          ,('  Address',AddressLine1)
          ,('  ',AddressLine2)
          ,('  City',City)
          ,('  State/Province',StateProvinceName)
          ,('  Postal Code',PostalCode)
          ,('  Country/Region',CountryRegionName)
          ,('','')
          ,('DEMOGRAPHIC INFO:','')
          ,('  Year Opened',str(YearOpened,4))
          ,('  Number of Employees',convert(varchar(10),NumberEmployees))
          ,('  Specialty',Specialty)
          ,('  Square Feet',convert(varchar(10),SquareFeet))
          ,('  Brands',Brands)
          ,('  Annual Sales','$'+convert(varchar(20),AnnualSales,1))) P(Property,Value)
where SeqNo=1
  and Value is not null

Since we’re effectively UNIONing the data in the APPLY, we of course had to CONVERT any numerical data to a VARCHAR value, but we didn’t have to care about the various lengths of the various VARCHAR values. Here are a couple of examples of the output:

select * from dbo.VerticalMainOfficeData(34)
/*
Property              Value
--------------------- -----------------------------
NAME AND ADDRESS:     
  Name                Cycles Wholesaler & Mfg.
  Address             Science Park South, Birchwood
                      Stanford House
  City                Warrington
  State/Province      England
  Postal Code         WA3 7BH
  Country/Region      United Kingdom
                      
DEMOGRAPHIC INFO:     
  Year Opened         1999
  Number of Employees 15
  Specialty           Touring
  Square Feet         21000
  Brands              4+
  Annual Sales        $800,000.00
*/
 
select * from dbo.VerticalMainOfficeData(388)
/*
Property              Value
--------------------- ---------------------------
NAME AND ADDRESS:     
  Name                Road Way Sales and Rental
  Address             2900 - 25055 Dunning Street
  City                Vancouver
  State/Province      British Columbia
  Postal Code         V7L 4J4
  Country/Region      Canada
                      
DEMOGRAPHIC INFO:     
  Year Opened         1979
  Number of Employees 6
  Specialty           Mountain
  Square Feet         10000
  Brands              3
  Annual Sales        $300,000.00
*/

Notice in the first example how the second address line is displayed… with a blank string in the Property column… so it looks like it’s just a second line of the address like it should be. The second example had no second line of address (it was NULL in the original table)… it was eliminated by the WHERE clause that eliminated any NULL values.

Finally, I decided to go wild and create a more extensive function that included contact information (up to 2 contacts… no Main Office of any Store in AdventureWorks has more than 2 contacts anyway).

Here is the function I created to do that.

if object_id('VerticalMainOfficeDataWithContacts','IF') is not null 
  drop function VerticalMainOfficeDataWithContacts
go
create function VerticalMainOfficeDataWithContacts
(
  @CustomerID int
)
returns table
as
return
with BaseData as
(
  select SeqNo=row_number() over (order by ContactType)
        ,Name,AddressLine1,AddressLine2,City,StateProvinceName
        ,PostalCode,CountryRegionName
        ,YearOpened,NumberEmployees,Specialty 
        ,SquareFeet,Brands,AnnualSales
        ,ContactType,Title,FirstName,MiddleName
        ,LastName,Suffix,Phone,EmailAddress
  from AdventureWorks.Sales.vStoreWithDemographics
  where CustomerID=@CustomerID
    and AddressType='Main Office'
)
,ContactInfo as
(
  select SeqNo
        ,ContactType
        ,ContactName=coalesce(Title+' ','')
                    +FirstName+' '
                    +coalesce(MiddleName+' ','')
                    +LastName
                    +coalesce(' '+Suffix,'')
        ,Phone
        ,EmailAddress
  from BaseData
)
,ContactUnpivot as
(
  select Property=Property+str(SeqNo,1)
        ,Value
  from ContactInfo
  cross apply (values ('ContactType',ContactType)
                     ,('ContactName',ContactName)
                     ,('Phone',Phone)
                     ,('EmailAddress',EmailAddress)) C(Property,Value)
)
,ContactColumns as
(
  select *
  from ContactUnpivot 
  pivot (max(Value) for Property in (ContactType1,ContactName1,Phone1,EmailAddress1 
                                    ,ContactType2,ContactName2,Phone2,EmailAddress2)) P
)
,UnpivotInput as
(
  select *
  from BaseData
  cross join ContactColumns
  where SeqNo=1
)
select Property,Value
from UnpivotInput  
cross apply 
   (values ('NAME AND ADDRESS:','')
          ,('  Name',Name)
          ,('  Address',AddressLine1)
          ,('  ',AddressLine2)
          ,('  City',City)
          ,('  State/Province',StateProvinceName)
          ,('  Postal Code',PostalCode)
          ,('  Country/Region',CountryRegionName)
          ,('','')
          ,('DEMOGRAPHIC INFO:','')
          ,('  Year Opened',str(YearOpened,4))
          ,('  Number of Employees',convert(varchar(10),NumberEmployees))
          ,('  Specialty',Specialty)
          ,('  Square Feet',convert(varchar(10),SquareFeet))
          ,('  Brands',Brands)
          ,('  Annual Sales','$'+convert(varchar(20),AnnualSales,1))
          ,('','')
          ,('CONTACT'+case when ContactType2 is not null then ' #1' else '' end+':','')
          ,('  Type',ContactType1)
          ,('  Name',ContactName1)
          ,('  Phone',Phone1)
          ,('  Email',EmailAddress1)
          ,('',case when ContactType2 is not null then '' end)
          ,('CONTACT #2:',case when ContactType2 is not null then '' end)
          ,('  Type',ContactType2)
          ,('  Name',ContactName2)
          ,('  Phone',Phone2)
          ,('  Email',EmailAddress2)) P(Property,Value)
where Value is not null

I’ll leave it up to you to examine how it works as an exercise. In a nutshell, it unpivots and re-pivots the contact information into a single-row derived table, and then that’s JOINed back to the original base data and then the whole thing is unpivoted.

Here are a couple examples of the output. Note how the first one has only 1 contact (with the heading ‘CONTACT’) and the second one has 2 contacts (with the headings ‘CONTACT #1’and ‘CONTACT #2’:

select * from dbo.VerticalMainOfficeDataWithContacts(34)
/*
Property              Value
--------------------- -----------------------------
NAME AND ADDRESS:     
  Name                Cycles Wholesaler & Mfg.
  Address             Science Park South, Birchwood
                      Stanford House
  City                Warrington
  State/Province      England
  Postal Code         WA3 7BH
  Country/Region      United Kingdom
                      
DEMOGRAPHIC INFO:     
  Year Opened         1999
  Number of Employees 15
  Specialty           Touring
  Square Feet         21000
  Brands              4+
  Annual Sales        $800,000.00
                      
CONTACT:              
  Type                Owner
  Name                Ms. Barbara J. German
  Phone               1 (11) 500 555-0181
  Email               barbara4@adventure-works.com
*/  
 
select * from dbo.VerticalMainOfficeDataWithContacts(50)
/*
Property              Value
--------------------- ------------------------
NAME AND ADDRESS:     
  Name                Hometown Riding Supplies
  Address             Bundesallee 9571
                      Rechnungsstelle - C 035
  City                Berlin
  State/Province      Brandenburg
  Postal Code         14197
  Country/Region      Germany
                      
DEMOGRAPHIC INFO:     
  Year Opened         1991
  Number of Employees 46
  Specialty           Mountain
  Square Feet         38000
  Brands              4+
  Annual Sales        $1,500,000.00
                      
CONTACT #1:           
  Type                Owner
  Name                Mr. William Vong
  Phone               809-555-0100
  Email               william6@adventure-works.com
                      
CONTACT #2:           
  Type                Purchasing Manager
  Name                Mr. Greg Harrison
  Phone               1 (11) 500 555-0112
  Email               greg2@adventure-works.com
*/

I hope this article gave you some interesting examples of using UNPIVOT (and yes, PIVOT also). I also hope it stimulated your imagination on creative things you can do with T-SQL.

17 comments:

SankarFebruary 25, 2010 at 11:53 PM
Brad,

Wow. We were anticipating a blog post not an entire chapter. You have the most creative way of presenting & narrating ideas. I am pretty sure you make an exceptional speaker and can't wait to listen. Have you talked to Mark?
Brian TkatchFebruary 26, 2010 at 5:47 AM
I just used UNPIVOT on a TABLE that has Monthly and Year-to-date numbers. I need to do a set of calculations on both of them. In order to not repeat each calculation, the answer was SELECT Id, Monthly UNION ALL SELECT Id, YTD or UNPIVOT. UNPIVOT has non-intuitive syntax, but it does make a query look a lot nicer.
Michael J. SwartFebruary 26, 2010 at 7:58 AM
That was awesome Brad,
It's cool to see you flex your UNPIVOT, PIVOT muscles.
I can imagine an application for these PIVOT/UNPIVOT where it might be easier to arrange and sort things in the database so that Reporting Services (or any other reporting tool) has an easier time of things. Your credit card example comes to mind.
Brad SchulzFebruary 26, 2010 at 9:52 AM
@Sankar: Thanks! I've considered talking to Mark about giving a presentation... but I don't really know our SIG audience that well... my gut feeling is that they are mostly Administrators rather than Developers... but I'll have to confirm that.

@Brian: UNPIVOT's syntax is (perhaps) easier to understand than PIVOT's, and yes, I agree that it does make a query look nicer/compact. BTW, in reality, UNIONing is faster than an UNPIVOT/APPLY in my testing on large data sets. I was considering posting a "Part 2-1/2" to talk about this and other little tidbits that came up in writing these articles.

@Michael: Thanks! I agree regarding preparing data for reporting... good point.
Brian TkatchFebruary 26, 2010 at 10:09 AM
Is SQL Server smart enough to rule out an UNPIVOT. Well, i think it's easier to say in code.

Case A:

WITH
A(Id, A, B)
AS
(
SELECT 1, 1, 2 UNION ALL
SELECT 2, 3, 4
),
B
AS
(
SELECT
Id,
D,
C
FROM
A UNPIVOT(C FOR D IN(A, B)) A
)
SELECT
Id,
C
FROM
B
WHERE
D = 'A';

Case B:

WITH
A(Id, A, B)
AS
(
SELECT 1, 1, 2 UNION ALL
SELECT 2, 3, 4
),
B
AS
(
SELECT
Id,
'A' D,
A
FROM
A
UNION ALL
SELECT
Id,
'B' D,
B
FROM
A
)
SELECT
Id,
A
FROM
B
WHERE
D = 'A';

The first case doesn't change if the WHERE clause is there or not. The second case, however, does.

Just wondering which is better when writing a VIEW that is expected to be used with just such a WHERE clause.
Brad SchulzFebruary 26, 2010 at 11:25 AM
@Brian: At first I didn't understand... but now I get it... Yes, the absence/presence of the WHERE clause in the first query DOES NOT change the underlying Query Plan... but the Query Plan DOES change on the second query depending on whether the WHERE is there or not.

Yes, if you want a view to have more potential for flexibility, then the UNION choice would be better. The UNPIVOT will force that APPLY operator to be in the plan no matter what.

I found out (from Rob Farley) that UNPIVOT is translated to an OUTER APPLY (rather than a CROSS APPLY) because it has the POTENTIAL to make the query more efficient... but UNPIVOT always throws in that filter on NULL, so it kind of ruins the whole thing anyway.

Even if we hard-code OUTER APPLY (rather than use UNPIVOT), we still can't take advantage of any short-circuiting just because of the nature of how APPLY works.

Hope this helps answer your question.
Brian TkatchMarch 1, 2010 at 5:37 AM
@Brad

Thanx! Not only does it answer the question, i'm going to change my code because of it. There's no reason to block out a good (chance for an) optimization later on.

Sorry about the confusion. I forgot the format would get messed up. I also figured that explaining what i meant would have taken longer (and might have added confusion rather than removing it) than to give you the basics.

So, i pretty much agree with you now. No reason to use PIVOT/UNPIVOT that have very specific cases, they can both be achieved manually, and the manual method allows optimization.

Bra, thanx for the great posts.
Adam HainesMarch 9, 2010 at 6:28 AM
Great post Brad! For some reason my "blogs that I follow" did not show, you had a new entry...?

Anyway, I loved the article. I found it be a little long, but I enjoyed reading it. You have presented a lot of great content.

Note: In the example, you stated that all columns must have the same data type to be UNPIVOTED. This is true, but the easy workaround for this is to convert each column to VARBINARY(N) (making sure it is large enough). At this point you can use a simple cast to varchar in the outer select list.

Again another great post Brad! :^)
Brad SchulzMarch 9, 2010 at 11:50 AM
@Adam:

Thanks for your comments... Sorry about the length, but there was so much stuff to show... and I didn't want to create another article in the series.

Great point about VARBINARY(N)... I notice I could also just CAST each column to a SQL_VARIANT, and therefore not worry about the length (except in the VARCHAR CASTing in the outer SELECT list).
Seth PhelabaumMarch 9, 2010 at 12:15 PM
This didn't show up on my aggregator (google reader) either. I saw that the unpivot 1 post had an active link to this post and was surprised (because I hadn't seen this part 2), but I clicked it anyways (expecting it to fail). Much to my surprise, it did not. Just figured I'd re-iterate this because that's a lot of work put into an article that many will miss because it didn't properly RSS for some reason.
Brad SchulzMarch 9, 2010 at 12:25 PM
@Garadin:

I had a problem with this post... because of its length, it made my feed too long (and therefore invalid), and I didn't realize it for several hours. I fixed the feed so that it only feeds 4 blog posts maximum and that seemed to help, but I guess it was probably too late.

It's unfortunate... You're apparently not the only person to have the problem (see Adam's post above).
Mohammad SalimabadiAugust 8, 2010 at 2:15 PM
>> Aggregations over Columns <<
Hi Brad,
In CASE method you can simplify the query.
Your logic for 3 columns is:
Case when col1 > col2 and col1 > col3 then col1
when col2 > col1 and col2 > col3 then col2
when col3 > col1 and col3 > col2 then col3
end as maxium_value

But the simpler logic is:
Case when col1 > col2 and col1 > col2 then col1
when col2 > col3 then col2
else col3
end as maximum_value

Also in this case we can use mixture of simple logic and ALL() predicate like this:
Case when col1 >ALL (select col2 union select col3) then col1
when col2 > (select col3) then col2
else col3
end as maximum_value
ravisynitNovember 18, 2020 at 1:00 AM
we at SynergisticIT offer the best best aws bootcamp
Carolyn KlosApril 6, 2021 at 12:53 PM
i believe that a student should have self-control. if a student cannot write an essay, then the student can always buy an essay from the service that is advertised in the article https://rousernews.com/best-us-colleges-universities-to-learn-programming/. but the student must be able to cope with moral difficulties on his own.
Carolyn KlosApril 6, 2021 at 12:53 PM
as for me, the student should have endurance. if the student cannot cope with homework then there is nothing wrong with that https://www.sothink.com/blog/3-tips-on-how-to-choose-the-cheapest-essay-writing-service/. but if a student cannot cope with his morale, then this is already a problem.
Carolyn KlosApril 6, 2021 at 12:54 PM
the student must have willpower and endurance. the student may not be able to do homework in chemistry, but the student can always buy homework from the site that is advertised in the article https://www.itravelnet.com/travel-blog/how-to-start-a-remote-business-to-travel-the-world/. but the student must have willpower and endurance, no one will help the student with this.
신용카드 현금화March 27, 2025 at 2:42 AM
Lisäsin sen kirjanmerkkiluettelooni ja aion palata pian. Ole hyvä ja tutustu myös minun sivustooni ja kerro mielipiteesi.

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Thursday, February 25, 2010

Spotlight on UNPIVOT, Part 2

17 comments:

Search This Blog

Subscribe

About Me

Keywords

Blog Archive

Followers

# # #

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Thursday, February 25, 2010

Spotlight on UNPIVOT, Part 2

17 comments:

Search This Blog

Subscribe

Subscribe To...

About Me

Keywords

Blog Archive

Followers

# # #