What is Normalization, and How to Know When You’re Done

From WikiContent

(Difference between revisions)
Jump to: navigation, search

Drsql (Talk | contribs)
(Normalization seems really frightening to people, like it is a hard process. It really isn't, but it also isn't a completely natural process that requires no knowledge to achieve.)
Next diff →

Revision as of 23:25, 8 January 2010

Normalization is often approached in two distinct ways. Firstly, there are a set of progressive “rules” that specify “forms”. However, using the normal forms in a stepwise manner is an atypical approach to designing data storage. More typically databases are built in a natural manner, with even inexperienced designers or (sometimes a secretary) making a “table” for concepts, and “columns” for descriptive pieces of information.

A substantial problem is that database design not completely natural. Relational engines are designed against the Normalization rules, and if the relational engine vendors are using a set of concepts to guide how they create the engine you use, it certainly won’t hurt if you understand why they do what they do. Normalization forms are stated as criteria that is desired for each form, usually stated that higher forms require adherence to the lower form. A (very)brief overview of the forms:

First: Unique rows, no arrays, and one value per column.

Second, Third, and Boyce-Codd: Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but a key.

Higher forms: Correct relationship and attribute cardinality, such as ensuring every column relates to the key with a cardinality of one.

After “getting” the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, not some generic blob of pieces that are constantly broken down and rebuilt like a 4 year old might do. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.

Shape Attributes: One attribute, one value.

Validate the relationships between attributes: Attributes either are a key or describe something about the entity identified by the key.

Scrutinize multivalued dependencies: Only one per entity. Make sure relationships between three values or tables are correct. Reduce all relationships to binary relationships if possible.

Finally, the question in the title still has yet to be conquered. “How do you know when you are done?” What commonly overlooked in defining Normalization is granularity. The word “atomic” is a common way to describe a table or column that is normalized enough. Atomic would tend to indicate something that is broken down to its absolute lowest form. But unless you skipped high school physics and chemistry completely, you know that there are lots sub-atomic particles. You should also know that when you try to mess with particles smaller than the atom, you get a mushroom cloud is not the kind Timothy Leary would have approved of for sure.

The true essence of Normalization is achieved when users have exactly the right number of places to store the data they need without ever needing to deal with data at a sub-column (and hence, sub-atomic) level …You will know you failed when users start using the structures you created in “creative” ways to get around your structures when they aren’t properly normalized.

Personal tools