Design a Strategy for Removing Unwanted Data

From WikiContent

Revision as of 04:31, 15 October 2009 by Neil (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

This is a work in progress, full of notes and ideas. Pardon the dust.


Good design factors in the overall lifecycle of the database. Databases grow over time, and it is critical to have a strategy to deal with this. For every piece of data you need to determine how long you are going to keep it. Once you determine the lifespawn of the data you will have to put in place a means of removing it.

How long data should be kept is based on what type of data it is. Transient data should be purged quickly. For example session tables should be purged shortly after the sessions expire. Failure to prune session tables has resulted in several applications grinding to a halt as they grow to massive proportions and bog down the database server. Design this in from the start and save many headaches.

In addition to tranient data, your database will also contain persistent data. Establishing the appropriate duration for keeping this data requires several steps. The first step is to understand the retention policies that could apply to this data. If the data you are storing is a log and your company requires you to keep logs of this type for 6 months, than you know how long to keep it. Some policies mandate keeping data a certain amount of time, others require destoying data data older than a specified time. Lacking clear external direction for this you will need to make a judgement call. However you arrive at this number it is safe to assume it will change in the future, so allow this to be adjusted in the future.

Even data that seems like it should be permanent should have a strategy for removal. If you have a database that tracks your customers you obviously don't want to expire current customers. You do however want to purge out former customers at some point.

Once you determine how long you are going to keep data you need to determine a strategy for getting rid of the data you don't want. In many cases the data should simply be deleted, but depending on your environment you may want to archive it so it could be retrieved later at need.

Personal tools