Retrofitting Table Level Delete/Archive Strategies - Short Term Storage Models

by Bill Wunder

 

When I began this series of articles I had expected to deliver a one-two punch and then move on to another aspect of SQL Server. Quickly it became obvious that this topic would require more discussion than I had anticipated. In the first segment, Unbounded Table Growth by Design we explored the all too common shortcut to data modeling that leaves an application with unbounded table growth patterns in rapidly growing tables and we examined how this practice is more debilitating the more successful an application becomes as efforts to remove old and no longer needed rows from table interferes with and perhaps even breaks the applications ability to use the table. Then in the second segment, Classifications of Logging Tables we attempted to define some loose categories that are useful as starting points for the rescue of good applications from such unbounded growth limitations. In that discussion we considered that there are a few applications that actually need to support unlimited table growth (recall we labeled this requirement as the History of the World) and others that are completely miscast in the role of an ever growing table when in fact they are merely temporary data containers that can be easily and regularly decapitated (with some irreverence we called these tables Wholly Unnecessary). That was supposed to leave us to discuss the majority of ever growing tables in this current segment and to look over some working demonstrations of retrofit partitioning strategies to help return the otherwise successful application to a well behaved and easily maintainable state. Well, now that I've had some time to ponder the real breadth of remaining topic I believe I'm going to have to cut this last piece of the pie into yet smaller slices in order to give the remaining things to be said fair play.

 

Stick with me and I'll present some working examples based on working solutions I have used along the requirements (and complexity) continuum from the simple partitioning strategy necessary for data that is slightly more than Wholly Unnecessary but has only limited value after a relatively short period of time (remember we dubbed this the Saturday Night Live container) to the data table that is rarely updated but might be read for a considerable period of time (our Middle of the Road data) and onward into realm of the table where rows are regularly read and updated for an extended period of time but need to be held to some relatively fixed size for performance and/or storage requirement reasons (or as we named it, a Life of The Party table).

 

As a general practice it is useful to approach any partitioning retrofit with a common methodology. Whether working on a simple model or on the most stringent, we want a conversion script that will morph the single table model to the desired partitioned model, a rollback script capable of returning the converted data model back into a single table model, and a rollover script that can flip flop the partitions as the table grows and the data ages. The conversions scripts can be stored procedures or simple run once scripts. The rollover script I strongly suggest be made a well documented stored procedure so that the involved dependencies for the partitioned table design remains visible to all developers. In these examples I will follow the convention of creating a stored procedure for each operation if for no other reason than to make it easier for you to deploy the examples into a test environment as you follow the discussions. In addition, each script will create the common database used for all examples if it is not found on the SQL Server.

 

For all scripts we will make good use of the INFORMATION_SCHEMA views. If you have addition questions concerning these views be sure to consult Books Online.

 

For all of the examples we will use the same relatively simple table structure with only the name changed to keep it obvious which tables belong with which example. We want to have adequate columns to demonstrate the necessary techniques but probably not as many columns as you may have to deal with in a real word retrofit of an unbounded table growth design. I think it will be immeasurably useful if you can find a test or development SQL Server 2000 to run the scripts as we move along, though I will also present enough result sets along the way so that you can see what is happening if for some reason you cannot run the demonstration.

 

Please open the Saturday Night Live script and refer to the script as we walk through it in the rest of this article.

 

The first order of business is to create a database using a database definition that would be useful to place the system catalog, the data, the indexes, and the transaction log file each on a separate device but for the test I'll be placing all three filegroups and the log in the same location. When you set out to build a partitioning retrofit you may be working with a production system where the devices are on different physical hardware but in early development (aka unit testing) you may find yourself working on the SQL Server 2000 Developer Edition installed on the single un-partitioned drive of your workstation. I strongly suggest using this approach of making everything look the same from inside SQL Server in terms of  where indexes, data and primary keys must be placed to assure adequate conversion scripting detail

 

Looking at the create database command for moment, notice the use of the full path to the [PRIMARY] file group as stored in sysdatabases to derive the filename parameter for the other files necessary. Feel free to modify the script as needed, just make sure you have a DATA file group and an INDEXES file group in addition to the PRIMARY filegroup or the examples will fail.

 

Next two roles are added in the newly created LocalPartitionDemo database. One role will server to document permissions granted to the application, the other to a data analyst. As the complexity of the scheme increases we'll notice that the interest of - and therefore the permissions granted to - the application will include a wider range of the partitioned dataset. Conversely, the data analyst role will help us to justify keeping data "for internal use only" that the application has no direct usage requirement. A well partitioned data set - one with check constraint on each table to help the optimizer with DML actions - is not necessary if our only need for the older part of the partitioned set is one off select queries. Later, in the Life of the Party case a well partitioned data set is necessary if the application must update all rows to a final state before they can be aged out of the kept set. Users are not added to roles in the example because we do not need to manipulate the users in the example scripts. In the real world there are users in each role.

 

At this point a an exact copy of the existing table is created in order to begin developing and testing the conversion and rollover processes of the

Saturday Night Live test case. Recall that the objectives in this model are to keep only a short recent frame of the data, perhaps an hour or a day or week or whatever This is driven by the application requirements. The reason we want this frame of data is so that we can provide a row level granularity to the application for that prescribed window and then complete an aggregation or archival operation on the aged data before it becomes permanently uninteresting for any purpose at the full detail granularity just after it is rolled over to the unused table of the pair. Such a table might be used to log orders or request or calls for the day or week and then at the end of the day or week the events are moved into a historical table, possibly at some level of aggregation. For the rest of the current discussion lets make the interval a day for ease of discussion. The historical table services all queries concerning data older than that day. In the Saturday Night Live model there may be an daily time slot when the application is least busy ideally suited to archive and purge the in use table. A typical delete operation can potentially extended unavailability or unresponsiveness of the application due to blocking contention on the logging table. To help this situation, we create two copies of the the log table: copy A and copy B. Then in that same time slot of low usage a quick swap out of those tables is executes. The application is protected from having to know which table is currently in use by replacing the original table with a view having the same name as that original table and using A and B suffixes on the two underlying tables in the new design. It is an important aspect of data that fits the Saturday Night Live model that the data will never need to be modified (insert, update, and delete) by the application once it moves outside of the established "current" interval. Important because this eliminates the need create an updatable partitioned view necessary to support such operations. As we will see later in discussions of Middle of the Road and Life of The Party models, maintaining an updatable view can require considerably more time and complexity during each switch between the A and B table as the one to receive new rows. For this short term model all we need to do is verify that yesterday's table - the one not currently in use - is successfully archived, truncate it, and alter the view to begin make that table the one underlying the application's currently in use view. The impact to the application should last only a second or two and the rollover script is nice and easy.

 

The conversion script in the Saturday Night Live script, as it will in all the examples, makes the existing table become the "A" version of the view shrouded table and then creates a clone of that table for the "B" version. Notice the use of the exec statement to create the views since the create statement must be in it's own batch. You could also use sp_executesql. 

 

 

The rollback script needs to figure out which of the two twin tables is currently in use and move it out from underneath the view. Notice how easily this is accomplished by a quick check in the INFORMATION_SCHEMA.

The rollover script determines the name of the current table and the other - not currently in use - table, then truncates the other table and alter the view or views appropriately as an atomic operation. This procedure could also perform or initiate the aggregation or archival activity using the new defined other table. If you are able to take care of the remaining business with this data in this procedure there is little reason to define any of the views save the one the application will use that carries the original table's name. Notice here we do use the sp_executesql command rather than an exec() function since this procedure will be used regularly and will likely be done by an automated task. I chose not to type the extra few lines in the conversion script and rollback script because those scripts will likely only be needed one time and I'll be there to see the Query Analyzer output - really just a little laziness on my part for sure. Its also interesting that the dynamic command string runs in a transaction so that the "holdlock,tablock" in has the effect of stopping other writes but allowing reads while the alter view happens.

That provides all the components we need to create and manage this partitioning method. You may wish to use the SQL Agent to initiate the nightly rollover or you may want the application to be involved. This is going to be driven by how much impact the rollover blocking has on the application. Most likely in the Saturday Night Live scenario it won't make much difference one way or the other because the blocking period of the rollover should last no more than a second or two in most cases.

Finally there is a unit test script to verify all scripts and do a rollover as well as a rollback. The results of the test script are included at the end if the Saturday Night Live script.

If you follow the test script through the output you can see the structure and names of the views and tables as well as the behavior of the data in this partitioned design. Once you get a real conversion to the point of successfully executing a similar test plan in the unit testing environment, you'll be ready to apply the conversion and a few rollovers in the integrated testing/development environment to verify that the conversion and rollover processing plays well with your application(s).

For those tables where you are removing a large chunk of rows on a regular basis and experiencing unbearable contention between the application and the row removal, this process can become a dear friend.

That brings us to the next level of partitioning. First lets take a nice long break and give everyone a chance to play with this example.

Hope to see you in a few days and we'll look over a somewhat more complex approach to locally partitioned retrofits.

Bill