What is Normalization, and How to Know When You’re Done

From WikiContent

(Difference between revisions)
Jump to: navigation, search
(Normalization seems really frightening to people, like it is a hard process. It really isn't, but it also isn't a completely natural process that requires no knowledge to achieve.)
Current revision (17:45, 22 January 2010) (edit) (undo)
 
(2 intermediate revisions not shown.)
Line 1: Line 1:
-
Normalization is often approached in two distinct ways. Firstly, there are a set of progressive “rules” that specify “forms”. However, using the normal forms in a stepwise manner is an atypical approach to designing data storage. More typically databases are built in a natural manner, with even inexperienced designers or (sometimes a secretary) making a “table” for concepts, and “columns” for descriptive pieces of information.
+
Normalization is, generally speaking, a set of criteria that have been established for creating a database that is suited for use with a relational database management system (RDBMS). It is based on a set of progressive “forms” that define criteria that signify some level of complience on the standard. Here is a (very) brief overview of the forms, where adherence to the "higher" form requires adherence to the "lower":
-
 
+
-
A substantial problem is that database design not completely natural. Relational engines are designed against the Normalization rules, and if the relational engine vendors are using a set of concepts to guide how they create the engine you use, it certainly won’t hurt if you understand why they do what they do. Normalization forms are stated as criteria that is desired for each form, usually stated that higher forms require adherence to the lower form. A (very)brief overview of the forms:
+
• '''First''': Unique rows, no arrays, and one value per column.
• '''First''': Unique rows, no arrays, and one value per column.
-
• '''Second, Third, and Boyce-Codd''': Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but a key.
+
• '''Second, Third, and Boyce-Codd''': Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but the key.
-
• '''Higher forms''': Correct relationship and attribute cardinality, such as ensuring every column relates to the key with a cardinality of one.
+
• '''Higher forms''': Correct relationship and attribute cardinality, such as in fourth normal form where you ensure every column relates to the key with a cardinality of one.
 +
 
 +
By following these rules, usually at least to Boyce-Codd normal form and ideally fourth (and beyond), you help to eliminate modification anomolies such as having two copies of what should have the same meaning yet have different values.
 +
 
 +
An important part of normalization is understanding how far to go. The word “atomic” is a common way to describe a table or column that is appropriately normalized. Atomic would tend to indicate something that is broken down to its absolute lowest form. Of course we know that there are sub-atomic particles and that when you try to mess with particles smaller than the atom, you get a mushroom cloud. The goal is to break down values to where you needn't split values using SQL, nor do you need to excessively concatenate values that you never use the individual parts.
 +
 
 +
While these rules seem like they might lead to a natural process where you move from first normal form up, database design is a mostly natural process. More typically, databases are built in a natural manner, with even inexperienced designers making a “table” for concepts, and “columns” for descriptive pieces of information. However, the process is not completely natural and without an understanding of normalization, it is common for a database to become poorly suited to being manipulated in an RDBMS.
-
After “getting” the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, not some generic blob of pieces that are constantly broken down and rebuilt like a 4 year old might do. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.
+
Once a designer understands the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, instead of some generic blob of pieces that are constantly broken down and rebuilt. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.
• '''Shape Attributes''': One attribute, one value.
• '''Shape Attributes''': One attribute, one value.
Line 15: Line 19:
• '''Validate the relationships between attributes''': Attributes either are a key or describe something about the entity identified by the key.
• '''Validate the relationships between attributes''': Attributes either are a key or describe something about the entity identified by the key.
-
• '''Scrutinize multivalued dependencies''': Only one per entity. Make sure relationships between three values or tables are correct. Reduce all relationships to binary relationships if possible.
+
• '''Scrutinize multivalued dependencies''': Only one per entity. Make sure relationships between three plus tables are correct. Reduce all relationships to binary relationships if possible.
-
 
+
-
Finally, the question in the title still has yet to be conquered. “How do you know when you are done?” What commonly overlooked in defining Normalization is granularity. The word “atomic” is a common way to describe a table or column that is normalized enough. Atomic would tend to indicate something that is broken down to its absolute lowest form. But unless you skipped high school physics and chemistry completely, you know that there are lots sub-atomic particles. You should also know that when you try to mess with particles smaller than the atom, you get a mushroom cloud is not the kind Timothy Leary would have approved of for sure.
+
-
The true essence of Normalization is achieved when users have exactly the right number of places to store the data they need without ever needing to deal with data at a sub-column (and hence, sub-atomic) level …You will know you failed when users start using the structures you created in “creative” ways to get around your structures when they aren’t properly normalized.
+
Normalization is complete when users have exactly the right number of places to store the data without ever needing to deal with data at a sub-atomic level. Sounds easy, and it would be, if users knew what they wanted, and would never change their mind.

Current revision

Normalization is, generally speaking, a set of criteria that have been established for creating a database that is suited for use with a relational database management system (RDBMS). It is based on a set of progressive “forms” that define criteria that signify some level of complience on the standard. Here is a (very) brief overview of the forms, where adherence to the "higher" form requires adherence to the "lower":

First: Unique rows, no arrays, and one value per column.

Second, Third, and Boyce-Codd: Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but the key.

Higher forms: Correct relationship and attribute cardinality, such as in fourth normal form where you ensure every column relates to the key with a cardinality of one.

By following these rules, usually at least to Boyce-Codd normal form and ideally fourth (and beyond), you help to eliminate modification anomolies such as having two copies of what should have the same meaning yet have different values.

An important part of normalization is understanding how far to go. The word “atomic” is a common way to describe a table or column that is appropriately normalized. Atomic would tend to indicate something that is broken down to its absolute lowest form. Of course we know that there are sub-atomic particles and that when you try to mess with particles smaller than the atom, you get a mushroom cloud. The goal is to break down values to where you needn't split values using SQL, nor do you need to excessively concatenate values that you never use the individual parts.

While these rules seem like they might lead to a natural process where you move from first normal form up, database design is a mostly natural process. More typically, databases are built in a natural manner, with even inexperienced designers making a “table” for concepts, and “columns” for descriptive pieces of information. However, the process is not completely natural and without an understanding of normalization, it is common for a database to become poorly suited to being manipulated in an RDBMS.

Once a designer understands the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, instead of some generic blob of pieces that are constantly broken down and rebuilt. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.

Shape Attributes: One attribute, one value.

Validate the relationships between attributes: Attributes either are a key or describe something about the entity identified by the key.

Scrutinize multivalued dependencies: Only one per entity. Make sure relationships between three plus tables are correct. Reduce all relationships to binary relationships if possible.

Normalization is complete when users have exactly the right number of places to store the data without ever needing to deal with data at a sub-atomic level. Sounds easy, and it would be, if users knew what they wanted, and would never change their mind.

Personal tools