What is Normalization, and How to Know When You’re Done
Normalization is, generally speaking, a set of criteria that have been established for creating a database that is suited for use with a relational database management system (RDBMS). It is based on a set of progressive “rules” that specify “forms” that signify some level of complience on the standard. A (very) brief overview of the forms, where adherence to the "higher" form requires adherence to the "lower":
• First: Unique rows, no arrays, and one value per column.
• Second, Third, and Boyce-Codd: Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but a key.
• Higher forms: Correct relationship and attribute cardinality, such as in fourth normal form where you ensure every column relates to the key with a cardinality of one.
By following these rules, usually at least to Boyce-Codd normal form and beyond, you help to eliminate modification anomolies such as having two copies of what should have the same meaning yet have different values.
What is commonly overlooked in defining Normalization is granularity. The word “atomic” is a common way to describe a table or column that is normalized enough. Atomic would tend to indicate something that is broken down to its absolute lowest form. But unless you skipped high school physics and chemistry completely, you know that there are sub-atomic particles. You should also know that when you try to mess with particles smaller than the atom, you get a mushroom cloud.
While these rules seem like they might lead to a natural process where you move from first normal form up, database design is a mostly natural process. More typically, databases are built in a natural manner, with even inexperienced designers or (sometimes a secretary) making a “table” for concepts, and “columns” for descriptive pieces of information. However, the process is not completely natural and without an understanding of normalization, it is common for a database to become poorly suited to being manipulated in an RDBMS.
Once a designer gets the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, not some generic blob of pieces that are constantly broken down and rebuilt like a 4 year old might do. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.
• Shape Attributes: One attribute, one value.
• Validate the relationships between attributes: Attributes either are a key or describe something about the entity identified by the key.
• Scrutinize multivalued dependencies: Only one per entity. Make sure relationships between three values or tables are correct. Reduce all relationships to binary relationships if possible.
So you are done when users have exactly the right number of places to store the data they need without ever needing to deal with data at a sub-column (and hence, sub-atomic) level. Sounds easy, and it would be, if users knew what they wanted, and would never change their mind.