Sunday, August 23, 2009

ALL, ANY, and SOME: The Three Stooges

Okay, I know there’s a customer in the Northwind database who placed an order on July 18, 1996, but I don’t know who it is. I don’t know how many orders the customer has placed… he may have placed a dozen for all I know… but that doesn’t matter. What I do know for sure is that this customer placed ALL his orders on 1996-07-18… every single one of ‘em.

So let’s try the following approach to find him:

select CustomerID
from Orders
group by CustomerID
having min(OrderDate)=max(OrderDate)
and min(OrderDate)='1996-07-18'
/*
CustomerID
----------
CENTC
*/
Well that’s nice, but a CustomerID really doesn’t mean anything to me. I really want to know the Company’s Name. So let’s take the above query and JOIN it with the Customers table:

select o.CustomerID
,c.CompanyName
from (select CustomerID
from Orders
group by CustomerID
having min(OrderDate)=max(OrderDate)
and min(OrderDate)='1996-07-18') o
join Customers c on o.CustomerID=c.CustomerID
/*
CustomerID CompanyName
---------- ----------------------------------------
CENTC Centro comercial Moctezuma
*/
Yep, that’s the guy! Centro commercial Moctezuma.

But wait a second… What was it that I had said again? I said, “this customer placed ALL his orders on 1996-07-18.”



Hey, isn’t there an ALL operator in SQL that hardly anyone ever uses? We could put together a query using that, and the query will be much more intuitive and easier to read! Hmmm… I wonder why this cool ALL operator isn’t used much… it seems to be pretty useful in this particular situation.

So let’s put together the following query, using a correlated sub-query, to return customers who have ALL their orders placed on 1996-07-18:

select CustomerID,CompanyName
from Customers
where '1996-07-18' = all (select OrderDate
from Orders
where CustomerID=Customers.CustomerID)
/*
CustomerID CompanyName
---------- ----------------------------------------
CENTC Centro comercial Moctezuma
FISSA FISSA Fabrica Inter. Salchichas S.A.
PARIS Paris spécialités
*/
Now ho-old on thar’! Our very first query only returned one customer. Who the heck are these other two guys?

Wait… let’s check their orders:

select OrderID,OrderDate
from Orders
where CustomerID in ('FISSA','PARIS')
/*
OrderID OrderDate
----------- -----------------------
NOTHING! NADA! ZIP! NO ORDERS FOR THESE GUYS!
DO NOT PASS GO! DO NOT COLLECT $200!
*/
Huh? What’s going on here? I thought the query with the ALL operator was quite specific and quite straightforward. Why did these jokers with no orders come out in the result?

Just to experiment further, instead of equals ALL, what about other comparison operators… greater than ALL, less than ALL, and not equal to ALL?

FISSA and PARIS appear in the result set of every one of those queries too!

Jeez, these two guys really get around… They’re like magicians. All their orders are EQUAL to 1996-07-18, and all their orders are GREATER than 1996-07-18, and all their orders are LESS then 1996-07-18, and all their orders are NOT EQUAL to 1996-07-18.

That just sounds… strange.

So what gives? Why does ALL give these weird results?

Well, to explain the behavior of the ALL operator, we must also look at the operators ANY and SOME. ANY and SOME are interchangeable and exactly the same in every respect, so I will just address ANY for the rest of this article. (Sorry ‘bout that, SOME).

To gain an understanding of the behavior of ALL and ANY, try the following experiment. Get a small box, and label it Box#1. Put a nickel (5 cents), a dime (10 cents), and a quarter (25 cents) inside. Give it to a family member and ask him/her the following questions:

Question 1: “Is twenty-five cents worth more than ANY of the coins in Box#1?”
Question 2: “Is twenty-five cents worth more than ALL (EACH) of the coins in Box#1?”
Question 3: “Is twenty-five cents worth the same as ANY of the coins in Box#1?”
Question 4: “Is twenty-five cents worth the same as ALL (EACH) of the coins in Box#1?”

Your family member might think this is strange, and (s)he might pause a bit at the awkwardly-constructed questions, but (s)he should be able to answer the questions accurately (with the answers yes, no, yes, and no).

Now take a completely EMPTY box and label it Box#2 and give it to the same person and ask those same 4 questions again (substituting the word “Box#1” with “Box#2”).

What kind of answers did you get? Did they look confused? Did they get angry? Even voilent, perhaps? I did this experiment with my family last week and they’ve been avoiding me ever since.

Try asking those same questions to T-SQL. Here’s what you get:

;with BoxOf3Coins(CoinValue) as (select  5 union all
select 10 union all
select 25)
,
EmptyBox as (select CoinValue
from BoxOf3Coins
where rand()<0)
select 'Is twenty-five cents worth more than ANY of the coins in Box#1?'
,case when 25 > any (select CoinValue from BoxOf3Coins)
then 'Yes' else 'No' end
union
all
select 'Is twenty-five cents worth more than ALL (EACH) of the coins in Box#1?'
,case when 25 > all (select CoinValue from BoxOf3Coins)
then 'Yes' else 'No' end
union
all
select 'Is twenty-five cents worth the same as ANY of the coins in Box#1?'
,case when 25 = any (select CoinValue from BoxOf3Coins)
then 'Yes' else 'No' end
union
all
select 'Is twenty-five cents worth the same as ALL (EACH) of the coins in Box#1?'
,case when 25 = all (select CoinValue from BoxOf3Coins)
then 'Yes' else 'No' end
union
all
select '----------',''
union all
select 'Is twenty-five cents worth more than ANY of the coins in Box#2?'
,case when 25 > any (select CoinValue from EmptyBox)
then 'Yes' else 'No' end
union
all
select 'Is twenty-five cents worth more than ALL (EACH) of the coins in Box#2?'
,case when 25 > all (select CoinValue from EmptyBox)
then 'Yes' else 'No' end
union
all
select 'Is twenty-five cents worth the same as ANY of the coins in Box#2?'
,case when 25 = any (select CoinValue from EmptyBox)
then 'Yes' else 'No' end
union
all
select 'Is twenty-five cents worth the same as ALL (EACH) of the coins in Box#2?'
,case when 25 = all (select CoinValue from EmptyBox)
then 'Yes' else 'No' end
/*
Is twenty-five cents worth more than ANY of the coins in Box#1? Yes
Is twenty-five cents worth more than ALL (EACH) of the coins in Box#1? No
Is twenty-five cents worth the same as ANY of the coins in Box#1? Yes
Is twenty-five cents worth the same as ALL (EACH) of the coins in Box#1? No
----------
Is twenty-five cents worth more than ANY of the coins in Box#2? No
Is twenty-five cents worth more than ALL (EACH) of the coins in Box#2? Yes
Is twenty-five cents worth the same as ANY of the coins in Box#2? No
Is twenty-five cents worth the same as ALL (EACH) of the coins in Box#2? Yes
*/
No hesitation at all. Not even on the empty box questions. But the answers still seem strange.

To understand why T-SQL answers the questions this way, let’s dig a little deeper.

Look at the first question:

“Is twenty-five cents worth more than ANY of the coins in Box#1?”

As I hinted at before, that’s kind of an awkward-sounding question. In proper English, we would flip the question around, like so:

”Are there ANY coins in Box#1 that are worth less than twenty-five cents?”

Doesn’t that sound more natural? And it’s easy to answer. If you look in the box and see a dime or a nickel or a penny, you can immediately answer yes.

Apparently the second version of the question sounds more natural to T-SQL too, because that’s generally what it does with an ANY query. It internally translates the WHERE clause from this:

where 25 > any (select CoinValue from Box)
into this:

where exists (select * from Box where CoinValue<25)
Now let’s look at the second question of our experiment:

”Is twenty-five cents worth more than ALL (EACH) of the coins in Box#1?”

Again, it’s awkward-sounding. Instead you would more naturally phrase the question like this:

”Is EVERY coin in Box#1 worth less than twenty-five cents?”

Now how do you go about answering that? Do you actually look at every single coin? No, you don’t. Once you see a coin worth twenty-five cents or more, then you stop, and you can confidently answer no. If you can’t find a coin worth twenty-five cents or more, then you can answer yes.

Your internal thought process goes something like this:

”Well, let’s see… Are they any coins in Box#1 worth twenty-five cents or more? If so, then the answer to your question is no. If not, then the answer to your question is yes.

That’s exactly what T-SQL does with an ALL query. It translates it into an ANY query (using the opposite comparison operator) and takes the NOT of it. In other words, it translates this:

where 25 > all (select CoinValue from Box)
into this:

where not (25 <= any (select CoinValue from Box))
And then, remember it translates the ANY query into an EXISTS query:

where not exists (select * from Box where CoinValue>=25)
Now you can answer the empty box question:

Question: ”Is EVERY coin in this empty box worth less than twenty-five cents?”

Answer: ”Well, let’s see… Are there any coins in the box worth twenty-five cents or more? Nope, there aren’t. So the answer to your question must be yes.

Even though we’ve come to the answer of yes logically, it still just feels weird, doesn’t it?

But at least we’ve solved our mystery of the July 18, 1996 orders. We now know that T-SQL translated this ALL query:

select CustomerID,CompanyName
from Customers
where '1996-07-18' = all (select OrderDate
from Orders
where CustomerID=Customers.CustomerID)
into this ANY query:

select CustomerID,CompanyName
from Customers
where not '1996-07-18' <> any (select OrderDate
from Orders
where CustomerID=Customers.CustomerID)
and then in turn translated that query into an EXISTS query:

select CustomerID,CompanyName
from Customers
where not exists (select *
from Orders
where CustomerID=Customers.CustomerID
and OrderDate<>'1996-07-18')
All three of the above query plans are identical... you can see the plan below. A Left Anti Semi Join combines (JOINs) two tables and only outputs rows from the left-hand side (LEFT SEMI) if there are no matching rows (ANTI) on the right-hand side. In our example, the left side was Customers and the right side was Orders for the Customer with an OrderDate not equal to 1996-07-18. In other words, a WHERE NOT EXIST query.



So, in conclusion, I think we can see now why ALL, ANY, and SOME are very rarely used. They are awkward-sounding and kind of backwards as far as natural English conversation is concerned, and they may not give the results you expect when working with a sub-query that returns the empty set.

They’re strange, eccentric, goofy, and kooky. I certainly wouldn’t hire them for any serious T-SQL work. Nyuk-nyuk-nyuk.

7 comments:

  1. I think the reason they are not used, is that they are not required. The very same results can be found by rearranging thew query. ALL is NOT EXISTS(SELECT * WHERE inner.OrderDate <> outer.OrderDate)

    That helps out on the proper english too.

    ReplyDelete
  2. Exactly... I didn't make that specific point in the post, but they are easily replaceable by other (arguably more clear) constructs (i.e. EXISTS).

    ReplyDelete
  3. I don't understand this: I would have thought the answer to question 1 was 'No': none of the coins are worth more than 25 cents

    ReplyDelete
  4. Hi Anonymous...

    Read question #1 again:

    “Is twenty-five cents worth more than ANY of the coins in Box#1?”

    ...which translates to...

    ”Are there ANY coins in Box#1 that are worth less than twenty-five cents?”

    Note the "less than".

    --Brad

    ReplyDelete
  5. Thanks Brad for this article, helped me to understand ALL and ANY(SOME) and why not to use them ;)

    ReplyDelete
  6. good explanation. thanks

    ReplyDelete