NULL is not = to anything

One of the fun facts about SQL Server and the relation model is the whole concept of three valued logic. Now I’m not breaking any new ground here I am just touching on something that trips people up when they expect a result and don’t get it due to the magic of NULL’s. To be honest, I’m no exception to falling into the unknown from time to time.

Codd laid out 12 fundamental rules of what a relational database system should conform to if it is to be considered truly relational.

Rule 3: Systematic treatment of null values:

The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

This rule, above all others has probably caused me the most heartburn over the years.

I’ve spent more time that I like to admit reading papers and books from pioneers of the relational model, E.F. Codd, C.J. Date. One of the great debates that carried on until the passing of Codd was over NULL and three valued logic in general. To give you an idea of how big and problematic NULL’s are, It’s been put forth that maybe we need to get rid of them all together or move deeper into the rabbit hole and make it 4 valued logic breaking NULL into different kinds of place holders for unknown.

Understand that every piece of data falls into a domain of some sort. 0 is a integer type. ‘’ is a empty string. NULL isn’t in any domain, or value type. It doesn’t represent something at all it is a place older period.

I’ve heard 3 valued logic described as yes/no/maybe but that isn’t accurate, it is true/false/UNKNOWN.

So the only logical thing that can happen to UNKNOWN is unknown. What’s even worse is UNKNOWN technically can’t be equal to UNKNOWN or NULL = NULL.

How do you know they are equal if they are both unknown?

 

For example:

select 1 where NULL = NULL

returns nothing since NULL can’t be equal to anything including NULL we don’t get a NULL back or the 1 back we tried to select.

select 6227 * 453 / 238 + NULL 

returns NULL

which makes since on the surface to almost everyone I work with.

select NULL / 0

returns NULL

To some folks this is confusing in a traditional programming since anything divided by zero gives us an error of cannot divide by zero.

Since NULL is the place holder for UNKNOWN there is no way to evaluate the statement other than UNKNOWN or NULL!

This must also carry through for string manipulation as well.

For example:

select 'here' + 'is ' + NULL 

returns NULL.

Again it is the old how can you concatenate something to the unknown problem.

Now with all this in our little busy heads we finally think we understand the problem in it’s fullness, but we don’t (or I always don’t).

Where things can get really sticky is in any kind of aggregate situation SUM(), AVG(). Generally, all aggregations have a NULL elimination step built into them.

So lets say we have a table that looks like this:

Col001 Col002
100 100
200 200
300 300
NULL 0

(this isn’t an article on table design so don’t sweat the lack of a key or the duplicate data in both columns)

create table myNumbers
(
Col001 int,
Col002 int
)
go
insert into myNumbers (Col001,Col002) VALUES (100,100)
insert into myNumbers (Col001,Col002) VALUES (200,200)
insert into myNumbers (Col001,Col002) VALUES (300,300)
insert into myNumbers (Col001,Col002) VALUES (NULL,300)
select avg(Col001) from myNumbers

select avg(Col002) from myNumbers

We get:

———–

200

Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row(s) affected)

 

———–

225

(1 row(s) affected)

What happens when the evaluation of the aggregation occurs there is no way to evaluate the NULL so that row is dropped and all the sudden you numbers look horribly wrong.

If I hadn’t put this to output to text we may have merrily trucked along and accepted that 200 was what we were expecting and not the 255 we really wanted due to the treatment of the NULL.

Since it is a warning and not an error our program won’t complain ether it will be more than happy to accept the result since it is valid one.

The only exception to this in general is using COUNT() since you can count a place holder just like you would a value type it isn’t evaluating the data held just that a row exists whether we know what is is in it or not.

 

I’ve only just scratched the surface of the unknown with this little post. I haven’t covered grouping or JOIN’s and I may in a later post.

 

My goal is simply to remind myself that dealing with NULL is never a trivial matter.

-Wes

It’s all magic to me.

Technorati Tags: ,

"Any technology, no matter how primitive, is magic to those who do not understand it."

freefall 

Arthur C. Clarke penned three laws of prediction

  1. When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
  2. The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
  3. Any sufficiently advanced technology is indistinguishable from magic.

The  last is the most well known. I love the change that the comic strip freefall made to it and felt that was most appropriate to my day to day dealings with people and technology in general.

On a regular basis, I hear people describe SQL Server as a black box or magic box and working with it in any real depth is an art or wizardry. That simply isn’t so. It is science 100%. Not to take away from the designers and developers of SQL Server and the people that push the boundaries on  what it is capable of, but it is all based fundamentally on engineering principles,math (relational algebra in particular) and the underlying technology of computers.

Once you demystify it, break it town into small enough parts you can quickly master the parts of it that effect your life in a reasonable amount of time.

One of the areas I focus on is I/O performance and SQL Server. So, I’ll be doing a multi-part post covering the entire I/O stack from how a hard disk works through SAN’s and eventually how this all effects SQL Server.

Hopefully you will find it useful.

Here is to my blogging endeavor!