One of the fun facts about SQL Server and the relation model is the whole concept of three valued logic. Now I’m not breaking any new ground here I am just touching on something that trips people up when they expect a result and don’t get it due to the magic of NULL’s. To be honest, I’m no exception to falling into the unknown from time to time.
Codd laid out 12 fundamental rules of what a relational database system should conform to if it is to be considered truly relational.
Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.
This rule, above all others has probably caused me the most heartburn over the years.
I’ve spent more time that I like to admit reading papers and books from pioneers of the relational model, E.F. Codd, C.J. Date. One of the great debates that carried on until the passing of Codd was over NULL and three valued logic in general. To give you an idea of how big and problematic NULL’s are, It’s been put forth that maybe we need to get rid of them all together or move deeper into the rabbit hole and make it 4 valued logic breaking NULL into different kinds of place holders for unknown.
Understand that every piece of data falls into a domain of some sort. 0 is a integer type. ‘’ is a empty string. NULL isn’t in any domain, or value type. It doesn’t represent something at all it is a place older period.
I’ve heard 3 valued logic described as yes/no/maybe but that isn’t accurate, it is true/false/UNKNOWN.
So the only logical thing that can happen to UNKNOWN is unknown. What’s even worse is UNKNOWN technically can’t be equal to UNKNOWN or NULL = NULL.
How do you know they are equal if they are both unknown?
For example:
select 1 where NULL = NULL
returns nothing since NULL can’t be equal to anything including NULL we don’t get a NULL back or the 1 back we tried to select.
select 6227 * 453 / 238 + NULL
returns NULL
which makes since on the surface to almost everyone I work with.
select NULL / 0
returns NULL
To some folks this is confusing in a traditional programming since anything divided by zero gives us an error of cannot divide by zero.
Since NULL is the place holder for UNKNOWN there is no way to evaluate the statement other than UNKNOWN or NULL!
This must also carry through for string manipulation as well.
For example:
select 'here' + 'is ' + NULL
returns NULL.
Again it is the old how can you concatenate something to the unknown problem.
Now with all this in our little busy heads we finally think we understand the problem in it’s fullness, but we don’t (or I always don’t).
Where things can get really sticky is in any kind of aggregate situation SUM(), AVG(). Generally, all aggregations have a NULL elimination step built into them.
So lets say we have a table that looks like this:
Col001 | Col002 |
100 | 100 |
200 | 200 |
300 | 300 |
NULL | 0 |
(this isn’t an article on table design so don’t sweat the lack of a key or the duplicate data in both columns)
create table myNumbers
(
Col001 int,
Col002 int
)
go
insert into myNumbers (Col001,Col002) VALUES (100,100)
insert into myNumbers (Col001,Col002) VALUES (200,200)
insert into myNumbers (Col001,Col002) VALUES (300,300)
insert into myNumbers (Col001,Col002) VALUES (NULL,300)
select avg(Col001) from myNumbers
select avg(Col002) from myNumbers
We get:
———–
200
Warning: Null value is eliminated by an aggregate or other SET operation.
(1 row(s) affected)
———–
225
(1 row(s) affected)
What happens when the evaluation of the aggregation occurs there is no way to evaluate the NULL so that row is dropped and all the sudden you numbers look horribly wrong.
If I hadn’t put this to output to text we may have merrily trucked along and accepted that 200 was what we were expecting and not the 255 we really wanted due to the treatment of the NULL.
Since it is a warning and not an error our program won’t complain ether it will be more than happy to accept the result since it is valid one.
The only exception to this in general is using COUNT() since you can count a place holder just like you would a value type it isn’t evaluating the data held just that a row exists whether we know what is is in it or not.
I’ve only just scratched the surface of the unknown with this little post. I haven’t covered grouping or JOIN’s and I may in a later post.
My goal is simply to remind myself that dealing with NULL is never a trivial matter.
-Wes