The Difference Between Logical and Physical Modeling

From WikiContent

(Difference between revisions)
Jump to: navigation, search
(There are many differences between the logical and physical model, and even another model to consider between the two.)
Current revision (17:37, 22 January 2010) (edit) (undo)
 
(3 intermediate revisions not shown.)
Line 1: Line 1:
-
The term “physical” has two meanings that cause a lot of confusion when DBA’s are talking to each other. The first encompasses SQL objects: tables, columns, constraints, etc. The second uses the term as referring to data is stored in the hardware.
+
A data model is an abstract representation of data requirements, from documentation to structure. A common misconception is that the model is only a graphic, though in reality it a complete language of how to communicate the structure. A model usually has an associated picture, giving a powerful overview of the database, but which lacks the fidelity of the complete model.
-
The latter usage fits better with the original meaning that was coined in Codd’s Twelve Rules for a Relational Database Management System in Rule 8: Physical Data Independence. It basically states that the RDBMS should be allowed to store the data in any way or anywhere it pleases without changing the meaning or access.
+
-
While my RDBMS experience is specifically with Microsoft SQL Server, from what I have gathered, nearly all of the popular RDBMS tools allow data to be arranged physically in different manners, like sorted on disk, partitioned across different disk drives, non-unique indexes applied, etc.
+
When creating a data model, there are usually two forms of the model, a '''logical''' model and '''physical''' model. The logical model usually comes first and represents the implementation nonspecific data requirements, while the physical model covers the implementation specific details used to implement the database.
-
Realistically, no matter what term is used, there are three distinct models that you ought to consider:
+
-
'''Logical''': Implementation nonspecific requirements for data storage
+
''The term “physical”’s meaning in Codd’s Twelve Rules for a Relational Database Management System in Rule 8: Physical Data Independence. It basically states that the RDBMS can store the data in any way without changing the meaning or programmatic access. Physical has evolved to encompass the SQL objects that are used to implement the system as well.''
-
'''Implementation''': RDBMS specific details needed to create a database that can actually store data
+
'''Logical'''
-
• '''Physical''': Hardware model has no client meaning, only where/how data is stored
+
The implementation nonspecific requirements are documented here. The logical model is a highly structured document of what the database is attempting to do for the client. The goal is to get the requirements from the user/artifacts, and present an ideal representation of the necessary data to solve the client’s needs.
-
'''Logical'''
+
Ideally the design will be void of details such as how SQL Server or Oracle would perform some task. Even things that cannot (or simply will not) be implemented in the data tier should be represented in the logical model.
-
The initial phase of implementation nonspecific requirements are commonly called the logical model. The goal is to get the requirements from the user/documentation, and end up with a relational database design that will represent near perfection. More or less the logical model is a highly structured document of what the database is attempting to do for the client. Ideally the design will be normalized, be void of implementation specific knowledge such as how SQL Server or Oracle would perform some task. The logical model is more or less documentation to the meaning of the client’s data structure needs.
+
-
'''Implementation'''
+
Naming is critical, and when naming tables and columns, the names should be fully spelled out with little, if any, abbreviations and no concessions to the max length of names in the RDBMS. Tables, relationships, columns, etc should also be defined as to their purpose.
-
The logical model directs programmers to create an RDBMS specific model. In this model, you adapt the logical model for implementation in the host RDBMS without dropping expected meaning. Exact datatypes, constraints, triggers, and whatever else may be needed to implement the logical model. This model should be reasonably void of performance details, like indexes to tune a query. This will be the job of the physical model.
+
 +
There are constructs that exist in the logical model in a manner that cannot be implemented directly. For example, a many-to-many relationship between two tables may be modeled, but cannot be implemented in a relational database without resolving the relationship by adding another table.
 +
'''Physical'''
'''Physical'''
-
Once you have the implementation model complete, data is being stored correctly, you will have a set of objects sitting in an RDBMS with little or no data. The physical model is all about adapting the implementation to a combination of users and hardware. This model will start to form by developers as they tune their programs, but in production, the DBA can change the physical model and no client will be affected in any manner other than performance. In an ideal situation, even moving a database to a new server, os, platform, could be a completely transparent action as long as the RDBMS stays the same.
 
-
In the end, having three distinct models for the “meaning”, the “implementation”, and the "physical” structures make the reality of the data architect, the programmers, and the job of the DBA much easier.
+
There are two sections of the physical model; the RDBMS specific model and the on-disk/performance tuning aspects. In the RDBMS section of the model, you adapt the logical model for implementation in the host RDBMS without dropping expected meaning. Exact types, constraints, triggers, and anything needed to implement the logical model. This model will contain everything that the programmer needs to build a database.
 +
 
 +
The second part of the model is about adapting the implementation to a combination of users and hardware. In production, the DBA can change the on-disk structures, and even moving the database to a new server, os, platform, etc should be a completely transparent action as long as the RDBMS stays the same.
 +
 
 +
By maintaining a logical, idealized model describing what the system should represent, seperate from the implementation details, the entire process of developing, testing, and maintaining databases throughout the entire lifecycle can be greatly improved.

Current revision

A data model is an abstract representation of data requirements, from documentation to structure. A common misconception is that the model is only a graphic, though in reality it a complete language of how to communicate the structure. A model usually has an associated picture, giving a powerful overview of the database, but which lacks the fidelity of the complete model.

When creating a data model, there are usually two forms of the model, a logical model and physical model. The logical model usually comes first and represents the implementation nonspecific data requirements, while the physical model covers the implementation specific details used to implement the database.

The term “physical”’s meaning in Codd’s Twelve Rules for a Relational Database Management System in Rule 8: Physical Data Independence. It basically states that the RDBMS can store the data in any way without changing the meaning or programmatic access. Physical has evolved to encompass the SQL objects that are used to implement the system as well.

Logical

The implementation nonspecific requirements are documented here. The logical model is a highly structured document of what the database is attempting to do for the client. The goal is to get the requirements from the user/artifacts, and present an ideal representation of the necessary data to solve the client’s needs.

Ideally the design will be void of details such as how SQL Server or Oracle would perform some task. Even things that cannot (or simply will not) be implemented in the data tier should be represented in the logical model.

Naming is critical, and when naming tables and columns, the names should be fully spelled out with little, if any, abbreviations and no concessions to the max length of names in the RDBMS. Tables, relationships, columns, etc should also be defined as to their purpose.

There are constructs that exist in the logical model in a manner that cannot be implemented directly. For example, a many-to-many relationship between two tables may be modeled, but cannot be implemented in a relational database without resolving the relationship by adding another table.

Physical

There are two sections of the physical model; the RDBMS specific model and the on-disk/performance tuning aspects. In the RDBMS section of the model, you adapt the logical model for implementation in the host RDBMS without dropping expected meaning. Exact types, constraints, triggers, and anything needed to implement the logical model. This model will contain everything that the programmer needs to build a database.

The second part of the model is about adapting the implementation to a combination of users and hardware. In production, the DBA can change the on-disk structures, and even moving the database to a new server, os, platform, etc should be a completely transparent action as long as the RDBMS stays the same.

By maintaining a logical, idealized model describing what the system should represent, seperate from the implementation details, the entire process of developing, testing, and maintaining databases throughout the entire lifecycle can be greatly improved.

Personal tools