Monday, April 21, 2008

Normalization for Normal People, Part I--The "Language" of the Database, cont'd. AGAIN

Last time, I ended my post with an ambiguous reference to my birth date. I did that to make a couple of points. Well, actually I wanted to reinforce a point I'd made in an earlier blog as well as to introduce one new point.

The reinforced point is that ambiguity in natural languages, like English, is desirable
because it enables us to write poetry and make jokes (at least I hope you recognized my birthday reference as a joke.)

Today, the new corollary to that point being introduced is this: Identifying and defining the attributes of importance in a well designed database depends, at least in part, on your ability to recognize ambiguity when you see it.

Recognizing and handling ambiguity is an important task. Unfortunately, it is also one we tend to skip over or rush through. I think that’s true because we are so comfortable with ambiguity in natural languages we just don’t recognize it’s significance in creating a useful language for our databases.

As Time Goes By...
Here's the ambiguous reference I used to set up this discussion:

"My birthday is the same date every year, but it’s a different date every time."

I'll bet that virtually everyone reading that made the appropriate inferences because of a combination of context, experience, knowledge of how natural languages work, and perhaps most importantly of all, a significant tolerance for ambiguity.

What I did, of course, was use two different interpretations of the word "birthday". I also counted on your being willing to tolerate a bit of ambiguity in order to “get the joke”.

It’s as Simple as One, Two, Three

Sometimes we use the word "birthday" to mean the month and day on which we celebrate the occasion of our births. That month and day are the same, year after year after year. In the United States, for example, we celebrate the birthday of George Washington on February 22nd.

Sometimes we use the word "birthday" to mean the exact date, month, day and year on which we celebrate our births. That month, day and year are different every time. This year, Washington’s birthday was February 22, 2008.

Moreover, and most importantly, there is a third meaning of the word "birthday". That is the date, month, day and year on which we were actually born. And that one, I don’t need to tell you is always the same. George Washington, for example, was born on February 22, 1732.

We combine context, experience, and knowledge of both language and social interaction to decide which of those three meanings are intended in any given sentence in a natural language. When you stop to think about what it takes to get it right, it’s not a simple task, but humans make it look simple because of our training and experience.

Here Comes the Good Stuff, So Pay Attention, Okay?

I know, I’m pounding this point into the ground. It’s such a basic concept, and so important to everything else that follows, that I would never forgive myself if I didn’t do everything in my power to help you get it.

The language of the database is most definitely NOT like any natural language. Ambiguity in the language of the database is not only undesirable, it’s one of the biggest problems you’ll ever encounter.

When you select any attribute, such as date of birth, as an attribute to distinguish between two or more individuals, you must define the MEANING of that attribute in a totally unambiguous way. Each adjective we select for our database must have ONE and ONLY ONE meaning. Moreover, everyone using that adjective to describe any object in your database must KNOW, and AGREE to use, that same meaning.

Further, to be useful, that meaning has to be something that doesn’t change from time to time. Therefore, if you intend to use Date of Birth as an attribute in a table which lists employees of your organization, the only useful meaning is the one which doesn’t change: the month, day, and year on which the person was born. Anything else related to birth day, birth date, or age can, and will, change over time. And in a database, that’s not a good thing.

Pardon Me, Sir, But I Do Believe Your Horse is Dead

I imagine that, by now, your patience with the subject of birthdays is running thin. One more point and then I’ll leave the carcass of this late, lamented pony to the scavengers hovering just over there behind the shrubs and tall weeds.

The Cheese Stands Alone

If you were raised in the United States in the 1940’s, 50’s or 60’s, that cheesy reference might make sense to you. If not, well sorry, but I’m a sucker for bad puns and vague references. “The Cheese Stands Alone” is a line from a child’s song we sang while playing a game. I don’t remember the rules, except that at the end of the game, the winner (or perhaps it was the loser) was left standing by himself or herself inside a circle of their playmates. As I recall, it was not fun to be the cheese.

It’s significance to this discussion is simply this: In addition to the “One and Only One Meaning” requirement, any good attribute must also be able to “Stand on its Own”. Unlike the cheese of yesterday’s playgrounds, it IS good to stand alone when you are an attribute in a database. Only cheesy attributes around here, okay?

You’re Only Young Once

That is, by the way, why date of birth is the kind of value that does makes a good attribute in a database, while AGE is not. Age can’t stand alone. It needs help to make any sense. It has to have an “As Of” component to be of any value. If you entered a person’s age, for example, as “42”, the simple passage of time would make that attribute inaccurate in as little as one day, or within 365 days at the most (except for Leap Year’s, when it has a slightly extended shelf life).

To sum it all up, only one of the three ways in which we commonly use the word “birthday” is static, or fixed. The other two are variable; they mean different things at different times, depending on the context. As we’ve just seen, the language of the database CAN NOT tolerate that kind of ambiguity. Computers do not have access to context in any meaningful sense, they do not "experience" communication between various humans, and they don't have external knowledge to help them make the right choice.

While I’ve spent our time today looking primarily at one particular attribute, date of birth for people, I could make the same kind of case for precision in defining any of the other attributes needed to accurately describe the entities in your database: Sale Date for transactions, for example. Is that the day the hand-shake between the sales person and the customer happened, or the day the check cleared? Whichever you decide, it has to mean the same thing every time it is recorded in the database, and it must be able to carry the full weight of uniquely identifying that date all by itself.

Therefore, whenever you pick an attribute to apply to one of the entities you are tracking, an important step in the process is making sure you nail down the precise definition for it.

Aged Cheese, Fine Wines and other Delicacies

Now that I’ve gone to considerable lengths to convince you that all attributes must be unchanging to be useful in the language of the database, I’m going to turn around and acknowledge that sometimes, it just doesn’t work that way.

Fine wines, good cheese, and seasoned database developers all tend to improve with age, but just how “old” is “old”, and how do you know when you’ve reached that point?

The answer, of course, is that some attributes aren’t static, or fixed, after all. They do change from time to time. Next time, therefore, I’ll tell you a little bit more about Variable vs Fixed Attributes.