Friday, September 25, 2009

Down For The COUNT(*)

I was playing around with the COUNT() aggregate the other day and found some very interesting behaviors.

As you may know, COUNT(*) will count all rows; whereas, COUNT(Expression) will count all rows in which Expression is not NULL.

Consider the following 3 queries:

select count(*) from northwind.dbo.customers
/* Returns 91 */

select count(1) from northwind.dbo.customers
/* Returns 91 */

select count('junk') from northwind.dbo.customers
/* Returns 91 */
Not only do they all return the same result, but they also generate the same query plan. In fact, if you look under the hood at the (XML) query plan, they all actually just do a COUNT(*):

COUNT(*)

In the case of COUNT(1) and COUNT(‘junk’), the Optimizer is “smart enough” to recognize that a constant is between the parentheses, and so it translates those into COUNT(*).

If you use deterministic functions that involve only constants (and no variables or columns from the source tables), then they will also get translated into a COUNT(*). All of the examples below generate the exact same query plan as above; in other words, one with AggType=”countstar”:

select count(left('abc',1)) from northwind.dbo.customers
/* Returns 91 */

select count(char(32)) from northwind.dbo.customers
/* Returns 91 */

select count(case when 1=1 then 1 else 0 end) from northwind.dbo.customers
/* Returns 91 */

select count(pi()*pi()) from northwind.dbo.customers
/* Returns 91 */

select count(isnumeric('xxx')) from northwind.dbo.customers
/* Returns 91 */
But I thought the following was interesting. In the first query below, I put an illegal division-by-zero expression, and in the second, I put in a number that will obviously overflow, yet they run perfectly:

select count(1/0) from northwind.dbo.customers
/* Returns 91 */

select count(1.79e308*1.79e308) from northwind.dbo.customers
/* Returns 91 */
Again, the Optimizer recognizes that some kind of constant expression is within COUNT’s parentheses, and so it happily translates it to a COUNT(*). It doesn’t bother evaluating the expressions at all. It essentially asks, “Is it an expression involving constants?”, and if the answer is “Yes”, it just throws the expression out the window and changes it to a COUNT(*).

Now what if NULL is passed to COUNT()?:

select count(null) from northwind.dbo.customers
/*
Msg 8117, Level 16, State 1, Line 1
Operand data type void type is invalid for count operator.
*/

select count(cast(null as int)) from northwind.dbo.customers
/* Returns 0 */
In the first example, you can see that NULL all by itself is not allowed, which is interesting. COUNT() must accept a parameter that belongs to some kind of datatype.

Once we establish a datatype, as in the second example, the query produces the expected result of 0, because COUNT() does not count NULLs. If you look under the hood at the (XML) query plan, you’ll see this:

COUNT(NULL)

So it essentially translated it into a COUNT() that acts on a constant value of NULL… in other words, a COUNT(NULL).

Look at the following examples. They all get the exact same query plan as the one above:

select count(1/null) from northwind.dbo.customers
/* Returns 0 */

select count(nullif(1,1)) from northwind.dbo.customers
/* Returns 0 */

select count(char(-1)) from northwind.dbo.customers
/* Returns 0 (because CHAR(-1) returns NULL) */
In all of the above, the Optimizer actually did evaluate the expressions within COUNT()’s parentheses, and when it determined that it was equal to NULL, it put together a plan that did a COUNT(NULL).

It doesn’t necessarily evaluate the expression all the way, though. It just sees a NULL somewhere in the expression, and so it knows it has to do more work. It essentially asks, “Will this constant expression evaluate to NULL?”. If the answer is “No”, then it creates the first COUNT(*) query plan we saw at the beginning of this article. If the answer is “Yes”, then it creates the COUNT(NULL) query plan.

Consider this:

select count(isnull(null,1/0)) from northwind.dbo.customers
/* Returns 91 */
The optimizer didn’t actually come up with an actual value for the expression. It just knows that ISNULL(NULL,1/0) is the exact same thing as just 1/0, so it translated it into COUNT(1/0) which then, in turn, translated to a COUNT(*).

Now what about COUNT(DISTINCT)?

Consider the following queries:

select count(distinct 1) from northwind.dbo.customers
/* Returns 1 */

select count(distinct 'junk') from northwind.dbo.customers
/* Returns 1 */

select count(distinct sqrt(power(1234,2)*sin(pi()))) from northwind.dbo.customers
/* Returns 1 */
All of these produce the exact same query plan (you can click on the picture to see a larger version):

SELECT COUNT(DISTINCT 1)

Working from right to left, it essentially does a TOP 1 (with no sort order), then does a COUNT(*) of that, to get 1 (obviously), then does a SUM() of that, which again produces 1, and then the Compute Scalar just converts that to an integer value, and that’s the final result.

In other words, it seems that again, it doesn’t care what the expression is. It just sees that some kind of constant expression is involved and so it just spits out a vanilla TOP 1, COUNT(*), SUM(), CONVERT(int) plan.

Or does it?

Try our good old division-by-zero and arithmetic-overflow queries with COUNT(DISTINCT):

select count(distinct 1/0) from northwind.dbo.customers
/*
Msg 8134, Level 16, State 1, Line 1
Divide by zero error encountered.
*/

select count(distinct 1.79e308*1.79e308) from northwind.dbo.customers
/*
Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type float.
*/
Hmmm… So these DO produce a run-time error this time, now that we’ve added the DISTINCT keyword. If you take a look at the query plan for those 2 queries…

SELECT COUNT(DISTINCT 1/0)

… you can see that instead of the TOP operator, there is a Compute Scalar operator, which actually calculates the expression. The Compute Scalar for the first query actually calculates 1/0 and the Compute Scalar for the second query actually calculates 1.79e308*1.79e308.

So the Optimizer must actually evaluate the expression itself and see that it is an expression that is invalid in some way, and then it actually goes out of its way to create a plan that will produce a run-time error by forcing that invalid expression to be calculated.

Verrrry Interrrresting

3 comments:

  1. select count(jwmasterforlist.transid) from jwmasterforlist, jwworkorderform

    is giving

    Msg 8115, Level 16, State 2, Line 1 Arithmetic overflow error converting expression to data type int.

    i want this problem to be solved so that my programs can work smoothly how to convert the result to bigint for count. i tried all combination but same results are coming that count is not a function. if anyone has a result please send it to mainaims1@gmail.com which is our development teams email ID. you can visit our website to www.mainaims.com to know more about our programs which are being executed at a very large scale.

    If any one has any queries then we can clarify the same on it.

    other results do not have much problem because they can be converted. in case of count function it is not so. we have large applications built on this function and cannot be changed.

    THIS IS DISASTER WAITING TO HAPPEN

    ReplyDelete
  2. Try using COUNT_BIG() instead of COUNT()

    ReplyDelete