Anyway, AdventureWorks management wants to know, given their sales statistics history, what sales figures they can expect in 6 months and 12 months and 18 months from now. They also want to know when they can expect to reach $10 million per month in sales. How about $11 million per month?

With no qualms about changing history, we interfere and offer our help.

Let’s see what kind of sales statistics we have up through June of 2004:

select Mth=OrderDate-day(OrderDate)+1If we were to put the data into Excel and graph those points and have Excel add a trend line, this is what we get:

,Sales=sum(TotalDue)

from Sales.SalesOrderHeader

where OrderDate<'2004-07-01'

group by OrderDate-day(OrderDate)+1

order by Mth

/*

Mth Sales

----------------------- ---------------------

2001-07-01 00:00:00.000 1172359.4289

2001-08-01 00:00:00.000 2605514.9809

2001-09-01 00:00:00.000 2073058.5385

2001-10-01 00:00:00.000 1688963.2744

. . .

2004-03-01 00:00:00.000 5272786.8106

2004-04-01 00:00:00.000 4722890.7352

2004-05-01 00:00:00.000 6518825.2262

2004-06-01 00:00:00.000 6728034.9873

*/

There are lots of up months and down months, but, in general, the trend is certainly up (always a good thing), with sales somewhere in the $2 million per month range back in 2001 and in the $6 million per month range by Jun2004.

So how can we make sales projections using SQL?

Like Excel did in generating the trend line, we will have to mathematically calculate the same thing using Linear Regression. You may recall from Algebra class that a line has the formula of

*y=mx+b*, where

*m*is the slope and

*b*is the y-intercept. Given a collection of

*(x,y)*values, the following Linear Regression formulas will calculate the slope and y-intercept values for us:

That looks like complicated stuff, but it’s actually very easy to implement.

Our Sales data has a datetime field (the 1st day of a month) and a Sales figure for that month. Those are our theoretical

*x*and

*y*values, but we can’t really use a datetime for our

*x*value… we have to have an actual numeric value of some kind. So let’s just assume our first chronological month just has a value of 1, and the next month is 2, and so on.

For simplicity’s sake, let’s take our query we used above and put our sales statistics into a temporary file called #MonthlySales:

select Mth=OrderDate-day(OrderDate)+1And now we can get our

,Sales=sum(TotalDue)

into #MonthlySales

from Sales.SalesOrderHeader

where OrderDate<'2004-07-01'

group by OrderDate-day(OrderDate)+1

*(x,y)*values from that as shown below. Note how the

*x*value was calculated using DATEDIFF and a scalar sub-query to get the month number. This is done just to make it more general without having to make assumptions about the underlying data.

select X=1+datediff(month,(select min(Mth) from #MonthlySales),Mth)From that, it’s easy to calculate all those nasty sums (Σ’s) needed by the slope and y-intercept formulas:

,Y=Sales

from #MonthlySales

order by X

/*

X Y

------ ---------------------

1 1172359.4289

2 2605514.9809

3 2073058.5385

4 1688963.2744

. . .

33 5272786.8106

34 4722890.7352

35 6518825.2262

36 6728034.9873

*/

;with XYData asThe actual values don’t mean anything to us, but that doesn’t matter because we will just use these to plug into our slope and y-intercept formulas below. Note that I use two CROSS APPLYs below because the value of the y-intercept requires that we calculate the value of the slope first. (Those of you who know me already know that I’m a big-time CROSS APPLY fan… click here for a look at previous blog entries about this frequently-misunderstood, underused operator).

(

select X=1+datediff(month,(select min(Mth) from #MonthlySales),Mth)

,Y=Sales

from #MonthlySales

)

select SumX=sum(X)

,SumY=sum(Y)

,SumXX=sum(X*X)

,SumXY=sum(X*Y)

,N=count(*)

from XYData

/*

SumX SumY SumXX SumXY N

------ ---------------- -------- ---------------- ------

666 140651405.9023 16206 3050875537.8704 36

*/

Also note that I introduced 1.0 into the Slope and Intercept calculations in order to force decimal/numeric math to come into play as opposed to possible integer math. In other words, if both our

*x*and

*y*values were integers, then all the SUM figures would be integers and therefore when we do division, we could end up with unexpected integer results. For example, SQL will calculate 500/251 as being equal to 1, but it will calculate 1.0*500/251 as 1.992031. We want that more accurate figure.

;with XYData asAgain, these figures appear pretty meaningless, but the Slope figure does tell us that we generally increase our monthly sales by about $115,528 per month, and that our sales at Month Number Zero was about $1.77 million.

(

select X=1+datediff(month,(select min(Mth) from #MonthlySales),Mth)

,Y=Sales

from #MonthlySales

)

,CalcSums as

(

select SumX=sum(X)

,SumY=sum(Y)

,SumXX=sum(X*X)

,SumXY=sum(X*Y)

,N=count(*)

from XYData

)

select Slope

,Intercept

from CalcSums

cross apply (select Slope=1.0*(N*SumXY-SumX*SumY)/(N*SumXX-SumX*SumX)) CalcSlope

cross apply (select Intercept=1.0*(SumY-Slope*SumX)/N) CalcIntercept

/*

Slope Intercept

------------------------ -----------------

115527.5492092277992277 1769723.83691539

*/

Anyway, we now have the values to use in our

*y=mx+b*formula for our trend line. So we can use that formula to make sales projections. The management wanted to know what sales figures we can expect (given the current trend) in Dec2004, Jun2005, and Dec2005.

So we will add another CTE (called ProjMonths) onto the existing query we have to introduce those months, and then finally calculate the projected sales for those months. Note how we use a CROSS APPLY to incorporate the Slope and Intercept values we calculated, then we use another CROSS APPLY to convert our Projected Months into appropriate

*x*values, and then we use yet another CROSS APPLY to plug those

*x*values into the

*y=mx+b*(or Y=Slope*X+Intercept) formula:

;with XYData asSo it looks like we can expect about $8 million in sales by Dec2005.

(

select X=1+datediff(month,(select min(Mth) from #MonthlySales),Mth)

,Y=Sales

from #MonthlySales

)

,CalcSums as

(

select SumX=sum(X)

,SumY=sum(Y)

,SumXX=sum(X*X)

,SumXY=sum(X*Y)

,N=count(*)

from XYData

)

,SlopeIntercept as

(

select Slope

,Intercept

from CalcSums

cross apply (select Slope=1.0*(N*SumXY-SumX*SumY)/(N*SumXX-SumX*SumX)) CalcSlope

cross apply (select Intercept=1.0*(SumY-Slope*SumX)/N) CalcIntercept

)

,ProjMonths as

(

select ProjMth=cast('2004-12-01' as datetime)

union all

select '2005-06-01'

union all

select '2005-12-01'

)

select ProjMth

,ProjSales

from ProjMonths

cross apply (select Slope,Intercept from SlopeIntercept) IncludeSlopeIntercept

cross apply (select ProjX=1+datediff(month

,(select min(Mth) from #MonthlySales)

,ProjMth)) CalcProjX

cross apply (select ProjSales=ProjX*Slope+Intercept) CalcProjSales

/*

ProjMth ProjSales

----------------------- ------------------------

2004-12-01 00:00:00.000 6621880.90370296

2005-06-01 00:00:00.000 7315046.19895832

2005-12-01 00:00:00.000 8008211.49421369

*/

Management also wanted to know when we can expect to reach $10 million and $11 million per month. So instead of taking an

*x*value and plugging it into

*y=mx+b*to get a

*y*value, we need to do the opposite… we will take a

*y*value and plug it into

*x=(y-b)/m*to get an

*x*value.

So we will use that in our query below. Blindly plugging in a

*y*value of 10,000,000 would give us an

*x*value of (10000000 - 1769723.83691539) / 115527.5492092277992277 = 71.24. That indicates that we’ll reach the $10 million mark about 24% of the way into Month Number 71. But our “month number” must be an integer, so we use the FLOOR() function to make it an actual integer ordinal number, and then we use the final CROSS APPLY to figure out which actual month that translates into:

;with XYData asNow we can tell management that they will have to wait until Jun2007 to reach $10 million per month in sales and it will take another 8 months to reach $11 million in Feb2008.

(

select X=1+datediff(month,(select min(Mth) from #MonthlySales),Mth)

,Y=Sales

from #MonthlySales

)

,CalcSums as

(

select SumX=sum(X)

,SumY=sum(Y)

,SumXX=sum(X*X)

,SumXY=sum(X*Y)

,N=count(*)

from XYData

)

,SlopeIntercept as

(

select Slope

,Intercept

from CalcSums

cross apply (select Slope=1.0*(N*SumXY-SumX*SumY)/(N*SumXX-SumX*SumX)) CalcSlope

cross apply (select Intercept=1.0*(SumY-Slope*SumX)/N) CalcIntercept

)

,ProjSales as

(

select ProjSales=10000000

union all

select 11000000

)

select ProjSales

,ProjMth

from ProjSales

cross apply (select Slope,Intercept from SlopeIntercept) IncludeSlopeIntercept

cross apply (select ProjX=floor((ProjSales-Intercept)/Slope)) CalcProjX

cross apply (select ProjMth=dateadd(month

,ProjX

,(select min(Mth) from #MonthlySales))) CalcProjMth

/*

ProjSales ProjMth

----------- -----------------------

10000000 2007-06-01 00:00:00.000

11000000 2008-02-01 00:00:00.000

*/

So it looks like our time-travelling adventure yielded some good. We were able to give the management of AdventureWorks what they wanted. I’m sure they will use the information to make the company even stronger… perhaps because of our help, they’ll soon become the Amazon.com of bicycle sales. Many people will buy bicycles who hadn’t bought before, and…

Oops… perhaps we did change history. Perhaps we set something in motion that could cause a time paradox of some kind… perhaps something that … I don’t know… might make me just wink out of existence! Naaahhh… that’s impossible… it’s ridiculous! After all, if that happened, then how could I write this blog? How could I type this very sentence? How could I th

Nice.

ReplyDeleteExample is a bit complicated, i may have to look back at this one.

Great post Brad. I really enjoyed reading it. I like that you discuss real world problems and their solutions. Keep up the good work.

ReplyDeleteI have written a SQLCLR for this purpose.

ReplyDeleteSee http://www.developerworkshop.net/software.html

and http://regression.developerworkshop.net/

There are a number of samples too!

@Peter: Cool... I'll check it out. Thanks!

ReplyDelete