What is Normalization, and How to Know When You’re Done

From WikiContent

(Difference between revisions)
Jump to: navigation, search
(Normalization seems really frightening to people, like it is a hard process. It really isn't, but it also isn't a completely natural process that requires no knowledge to achieve.)
Line 1: Line 1:
-
Normalization is often approached in two distinct ways. Firstly, there are a set of progressive “rules” that specify “forms”. However, using the normal forms in a stepwise manner is an atypical approach to designing data storage. More typically databases are built in a natural manner, with even inexperienced designers or (sometimes a secretary) making a “table” for concepts, and “columns” for descriptive pieces of information.
+
Normalization is, generally speaking, a set of criteria that have been established for creating a database that is suited for use with a relational database management system (RDBMS). It is based on a set of progressive “rules” that specify “forms” that signify some level of complience on the standard. A (very) brief overview of the forms, where adherence to the "higher" form requires adherence to the "lower":
-
 
+
-
A substantial problem is that database design not completely natural. Relational engines are designed against the Normalization rules, and if the relational engine vendors are using a set of concepts to guide how they create the engine you use, it certainly won’t hurt if you understand why they do what they do. Normalization forms are stated as criteria that is desired for each form, usually stated that higher forms require adherence to the lower form. A (very)brief overview of the forms:
+
• '''First''': Unique rows, no arrays, and one value per column.
• '''First''': Unique rows, no arrays, and one value per column.
Line 7: Line 5:
• '''Second, Third, and Boyce-Codd''': Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but a key.
• '''Second, Third, and Boyce-Codd''': Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but a key.
-
• '''Higher forms''': Correct relationship and attribute cardinality, such as ensuring every column relates to the key with a cardinality of one.
+
• '''Higher forms''': Correct relationship and attribute cardinality, such as in fourth normal form where you ensure every column relates to the key with a cardinality of one.
 +
 
 +
By following these rules, usually at least to Boyce-Codd normal form and beyond, you help to eliminate modification anomolies such as having two copies of what should have the same meaning yet have different values.
 +
 
 +
What is commonly overlooked in defining Normalization is granularity. The word “atomic” is a common way to describe a table or column that is normalized enough. Atomic would tend to indicate something that is broken down to its absolute lowest form. But unless you skipped high school physics and chemistry completely, you know that there are sub-atomic particles. You should also know that when you try to mess with particles smaller than the atom, you get a mushroom cloud.
 +
 
 +
While these rules seem like they might lead to a natural process where you move from first normal form up, database design is a mostly natural process. More typically, databases are built in a natural manner, with even inexperienced designers or (sometimes a secretary) making a “table” for concepts, and “columns” for descriptive pieces of information. However, the process is not completely natural and without an understanding of normalization, it is common for a database to become poorly suited to being manipulated in an RDBMS.
-
After “getting” the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, not some generic blob of pieces that are constantly broken down and rebuilt like a 4 year old might do. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.
+
Once a designer gets the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, not some generic blob of pieces that are constantly broken down and rebuilt like a 4 year old might do. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.
• '''Shape Attributes''': One attribute, one value.
• '''Shape Attributes''': One attribute, one value.
Line 17: Line 21:
• '''Scrutinize multivalued dependencies''': Only one per entity. Make sure relationships between three values or tables are correct. Reduce all relationships to binary relationships if possible.
• '''Scrutinize multivalued dependencies''': Only one per entity. Make sure relationships between three values or tables are correct. Reduce all relationships to binary relationships if possible.
-
Finally, the question in the title still has yet to be conquered. “How do you know when you are done?” What commonly overlooked in defining Normalization is granularity. The word “atomic” is a common way to describe a table or column that is normalized enough. Atomic would tend to indicate something that is broken down to its absolute lowest form. But unless you skipped high school physics and chemistry completely, you know that there are lots sub-atomic particles. You should also know that when you try to mess with particles smaller than the atom, you get a mushroom cloud is not the kind Timothy Leary would have approved of for sure.
+
So you are done when users have exactly the right number of places to store the data they need without ever needing to deal with data at a sub-column (and hence, sub-atomic) level. Sounds easy, and it would be, if users knew what they wanted, and would never change their mind.
-
 
+
-
The true essence of Normalization is achieved when users have exactly the right number of places to store the data they need without ever needing to deal with data at a sub-column (and hence, sub-atomic) level …You will know you failed when users start using the structures you created in “creative” ways to get around your structures when they aren’t properly normalized.
+

Revision as of 23:45, 10 January 2010

Normalization is, generally speaking, a set of criteria that have been established for creating a database that is suited for use with a relational database management system (RDBMS). It is based on a set of progressive “rules” that specify “forms” that signify some level of complience on the standard. A (very) brief overview of the forms, where adherence to the "higher" form requires adherence to the "lower":

First: Unique rows, no arrays, and one value per column.

Second, Third, and Boyce-Codd: Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but a key.

Higher forms: Correct relationship and attribute cardinality, such as in fourth normal form where you ensure every column relates to the key with a cardinality of one.

By following these rules, usually at least to Boyce-Codd normal form and beyond, you help to eliminate modification anomolies such as having two copies of what should have the same meaning yet have different values.

What is commonly overlooked in defining Normalization is granularity. The word “atomic” is a common way to describe a table or column that is normalized enough. Atomic would tend to indicate something that is broken down to its absolute lowest form. But unless you skipped high school physics and chemistry completely, you know that there are sub-atomic particles. You should also know that when you try to mess with particles smaller than the atom, you get a mushroom cloud.

While these rules seem like they might lead to a natural process where you move from first normal form up, database design is a mostly natural process. More typically, databases are built in a natural manner, with even inexperienced designers or (sometimes a secretary) making a “table” for concepts, and “columns” for descriptive pieces of information. However, the process is not completely natural and without an understanding of normalization, it is common for a database to become poorly suited to being manipulated in an RDBMS.

Once a designer gets the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, not some generic blob of pieces that are constantly broken down and rebuilt like a 4 year old might do. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.

Shape Attributes: One attribute, one value.

Validate the relationships between attributes: Attributes either are a key or describe something about the entity identified by the key.

Scrutinize multivalued dependencies: Only one per entity. Make sure relationships between three values or tables are correct. Reduce all relationships to binary relationships if possible.

So you are done when users have exactly the right number of places to store the data they need without ever needing to deal with data at a sub-column (and hence, sub-atomic) level. Sounds easy, and it would be, if users knew what they wanted, and would never change their mind.

Personal tools