What is Normalization, and How to Know When You’re Done

From WikiContent

(Difference between revisions)
Jump to: navigation, search
Current revision (17:45, 22 January 2010) (edit) (undo)
 
(One intermediate revision not shown.)
Line 1: Line 1:
-
Normalization is, generally speaking, a set of criteria that have been established for creating a database that is suited for use with a relational database management system (RDBMS). It is based on a set of progressive “rules” that specify “forms” that signify some level of complience on the standard. A (very) brief overview of the forms, where adherence to the "higher" form requires adherence to the "lower":
+
Normalization is, generally speaking, a set of criteria that have been established for creating a database that is suited for use with a relational database management system (RDBMS). It is based on a set of progressive “forms” that define criteria that signify some level of complience on the standard. Here is a (very) brief overview of the forms, where adherence to the "higher" form requires adherence to the "lower":
• '''First''': Unique rows, no arrays, and one value per column.
• '''First''': Unique rows, no arrays, and one value per column.
-
• '''Second, Third, and Boyce-Codd''': Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but a key.
+
• '''Second, Third, and Boyce-Codd''': Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but the key.
• '''Higher forms''': Correct relationship and attribute cardinality, such as in fourth normal form where you ensure every column relates to the key with a cardinality of one.
• '''Higher forms''': Correct relationship and attribute cardinality, such as in fourth normal form where you ensure every column relates to the key with a cardinality of one.
-
By following these rules, usually at least to Boyce-Codd normal form and beyond, you help to eliminate modification anomolies such as having two copies of what should have the same meaning yet have different values.
+
By following these rules, usually at least to Boyce-Codd normal form and ideally fourth (and beyond), you help to eliminate modification anomolies such as having two copies of what should have the same meaning yet have different values.
-
What is commonly overlooked in defining Normalization is granularity. The word “atomic” is a common way to describe a table or column that is normalized enough. Atomic would tend to indicate something that is broken down to its absolute lowest form. But unless you skipped high school physics and chemistry completely, you know that there are sub-atomic particles. You should also know that when you try to mess with particles smaller than the atom, you get a mushroom cloud.
+
An important part of normalization is understanding how far to go. The word “atomic” is a common way to describe a table or column that is appropriately normalized. Atomic would tend to indicate something that is broken down to its absolute lowest form. Of course we know that there are sub-atomic particles and that when you try to mess with particles smaller than the atom, you get a mushroom cloud. The goal is to break down values to where you needn't split values using SQL, nor do you need to excessively concatenate values that you never use the individual parts.
-
While these rules seem like they might lead to a natural process where you move from first normal form up, database design is a mostly natural process. More typically, databases are built in a natural manner, with even inexperienced designers or (sometimes a secretary) making a “table” for concepts, and “columns” for descriptive pieces of information. However, the process is not completely natural and without an understanding of normalization, it is common for a database to become poorly suited to being manipulated in an RDBMS.
+
While these rules seem like they might lead to a natural process where you move from first normal form up, database design is a mostly natural process. More typically, databases are built in a natural manner, with even inexperienced designers making a “table” for concepts, and “columns” for descriptive pieces of information. However, the process is not completely natural and without an understanding of normalization, it is common for a database to become poorly suited to being manipulated in an RDBMS.
-
Once a designer gets the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, not some generic blob of pieces that are constantly broken down and rebuilt like a 4 year old might do. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.
+
Once a designer understands the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, instead of some generic blob of pieces that are constantly broken down and rebuilt. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.
• '''Shape Attributes''': One attribute, one value.
• '''Shape Attributes''': One attribute, one value.
Line 19: Line 19:
• '''Validate the relationships between attributes''': Attributes either are a key or describe something about the entity identified by the key.
• '''Validate the relationships between attributes''': Attributes either are a key or describe something about the entity identified by the key.
-
• '''Scrutinize multivalued dependencies''': Only one per entity. Make sure relationships between three values or tables are correct. Reduce all relationships to binary relationships if possible.
+
• '''Scrutinize multivalued dependencies''': Only one per entity. Make sure relationships between three plus tables are correct. Reduce all relationships to binary relationships if possible.
-
So you are done when users have exactly the right number of places to store the data they need without ever needing to deal with data at a sub-column (and hence, sub-atomic) level. Sounds easy, and it would be, if users knew what they wanted, and would never change their mind.
+
Normalization is complete when users have exactly the right number of places to store the data without ever needing to deal with data at a sub-atomic level. Sounds easy, and it would be, if users knew what they wanted, and would never change their mind.

Current revision

Normalization is, generally speaking, a set of criteria that have been established for creating a database that is suited for use with a relational database management system (RDBMS). It is based on a set of progressive “forms” that define criteria that signify some level of complience on the standard. Here is a (very) brief overview of the forms, where adherence to the "higher" form requires adherence to the "lower":

First: Unique rows, no arrays, and one value per column.

Second, Third, and Boyce-Codd: Every candidate key is identified, and all attributes are fully dependent on a key, and columns identify facts about a key and nothing but the key.

Higher forms: Correct relationship and attribute cardinality, such as in fourth normal form where you ensure every column relates to the key with a cardinality of one.

By following these rules, usually at least to Boyce-Codd normal form and ideally fourth (and beyond), you help to eliminate modification anomolies such as having two copies of what should have the same meaning yet have different values.

An important part of normalization is understanding how far to go. The word “atomic” is a common way to describe a table or column that is appropriately normalized. Atomic would tend to indicate something that is broken down to its absolute lowest form. Of course we know that there are sub-atomic particles and that when you try to mess with particles smaller than the atom, you get a mushroom cloud. The goal is to break down values to where you needn't split values using SQL, nor do you need to excessively concatenate values that you never use the individual parts.

While these rules seem like they might lead to a natural process where you move from first normal form up, database design is a mostly natural process. More typically, databases are built in a natural manner, with even inexperienced designers making a “table” for concepts, and “columns” for descriptive pieces of information. However, the process is not completely natural and without an understanding of normalization, it is common for a database to become poorly suited to being manipulated in an RDBMS.

Once a designer understands the concepts of normalization, building a database can become more like a well-thought out Lego creation, assembled to meet your final goal, instead of some generic blob of pieces that are constantly broken down and rebuilt. I distill it down to the following precepts, referring back to the “real” normalization rules when something is complex.

Shape Attributes: One attribute, one value.

Validate the relationships between attributes: Attributes either are a key or describe something about the entity identified by the key.

Scrutinize multivalued dependencies: Only one per entity. Make sure relationships between three plus tables are correct. Reduce all relationships to binary relationships if possible.

Normalization is complete when users have exactly the right number of places to store the data without ever needing to deal with data at a sub-atomic level. Sounds easy, and it would be, if users knew what they wanted, and would never change their mind.

Personal tools