Saturday, April 5, 2008

Normalization for Normal People, Part I--The "Language" of the Database, cont'd.

When you are able to apply a single, unambiguous name to every thing, or "entity" you want to track in your database, you are able to slap a label on each one of them, clearly identifying them as that thing.

To return to our language metaphor, there is one noun that serves as the name for each entity and everyone can clearly state which noun that is: "That's a car.", "That's an employee.", "That's a sale.", "That's a product.", "That's a Shipping Method." and so on. The nouns of your database language provide names for everything you want to track in it.

Just Another Pretty Analogy
Before we move on, though, here’s another way to think of it. This should help us transition on to the importance of the next element of our database language, the adjective.

Picture yourself sitting on a stool at the door to a storeroom in which you have a number of large barrels. Each barrel is labeled with the name of the things that go in it. As your assistants bring you things to put into your barrels, you can point to the object they are carrying and say, "That's a Car; put it in the Car Barrel.", or "That's a Sale; put it in the Sale Barrel." (This is a metaphor, so you can use your imagination to allow people to carry cars in their hands, or to allow a "sale", which is an abstract construct, to exist as a physical object in a pile of other, similar objects.)

Because there is no ambiguity about the names of the objects your assistants are caryying, and because each barrel is clearly labeled with those names, you don't have to turn around to make sure your assistants are putting things in the right barrels. Congratulations, you’ve succeeded in eliminating ambiguity in this phase of your database design!!

However, within each barrel of cars, or employees, or products, you also need to be able to distinguish one car from another, one employee from another, one product from another, one sale from another, or one shipping method from another. Why? Because at some point, someone else is going to come to you and ask something like, "Can you get me the sale completed by Ted Danson on December 12th to ABC Corporation?"

Now you need to get off your stool, walk over to the "Sales" barrel, and pull out one, and only one, sale from the hundreds of sales in that barrel. You can't just reach in and pull out any old sale, you need to be able to find the exact one requested.

To accomplish that task, of course, you need to identify or define the characteristics, or attributes, of each object in the barrel. Those identifying characteristics, called Attributes in the database world, allow you to discriminate one car from another, one employee from another, one sale from another.

It should be obvious that the exact set of attributes required for each entity will be different from those required for other entities, even though there are SOME attributes which can be applied to several entities.

Let me clarify that statement a bit with a couple of examples. People have names. In most cultures, they are given two or more names, one reflecting their familial standing and one personal name.

My family name, for example, is “Hepworth” and my Personal names are “George” and “Russell” . (Now, if this were a discourse on genealogy instead of data normalization, I’d branch off at this point into an exposition on why my two names are “George” and “Russell” and why there are members of some branches of the Hepworth family who don’t speak to some members of my branch, but that will have to wait for a different blog and time.)

Cars have names, too. Some are called “Ford”, some “Lexus”, some “Volkswagen”.

“Name”, it seems to me, is an attribute attached to the great majority of entities.

Sales transactions, on the other hand, don’t commonly have names attached to them. They are identified by other things, like the date they occurred, the customer involved, the products or services transferred from the seller to the buyer, and so on, but I’ve never yet worked on a database where sales were given the equivalent of a name.

To my way of thinking, the importance of this fact about the universality and applicability of attributes is that you can easily come up with a list of attributes that would, or could, apply to any barrel of cars, or barrel of employees, or any barrel of sales transactions in any storeroom in any organization around the world. If the barrel contains “employees”, you can assured that one of the attributes you’ll need to distinguish between employees is their names.

More importantly still, you can leverage that fact by anticipating the same set of attributes for “employee” whenever you start a new database which has a table of employees in it. You can go ahead and create a standard template table for employees which has all of the attributes you normally expect to see for employees. From then on, all you need to do is import that table into your new database and modify it, if needed.

You mean to say you CAN get there from here ?
I’m getting ahead of myself a bit here, talking about templates and standard sets of attributes, but I can’t help getting excited when I think about the possibilities: one single “template” database with one set of standard tables, each with its own standard set of attributes. Can it be done? I think so, but before we get there, we still need to talk a lot more about the nature of attributes themselves.

Turning once again to the database language: attributes are the adjectives needed to distinguish one car from another, one employee from another, one sale from another, one shipping method from another, one piece of equipment from another, and so on and so on.

With the ability to make unambiguous distinctions between each item in each barrel in our storeroom, we can find any single item requested at any time, and always get the right item.

In the case of the requested sales transaction, those attributes are the
  • The sales person involved: Ted Danson
  • The customer involved: ABC Corporation
  • The transaction date: December 12th
What I REALLY meant to say
By the way, can you spot the problem in the above list? It's an important issue to consider when you need to figure out what constitutes a valid attribute. Just as with the names of things, attibutes must be precise and unambiguous or they are worse than useless.

Next time, I’ll go into more detail about the different kinds of attributes you will find in your databases and how to be sure you've captured them properly.

Here’s a hint: My birthday is the same date every year, but it’s a different date every time.