Writing Reusable Migration and Change Scripts

by Bill Wunder

We know what we are, but know not what we may be. - William Shakespeare

There seems to be as many ways to turn ideas into applications as there are people turning ideas into computer applications. I believe it's a safe bet that not a single idea became an application in one fell swoop. Instead, ideas become applications through some sort of a process. One of the challenges that all people involved in application development hold in common is how to introduce and manage change.

Some development processes - particularly with a compiled code base where the lifecycle includes robust regression testing - lend themselves well to change management and others - most notably scripted technologies such as T-SQL - do not. Perhaps as SQL Server 2005's CLR integration gains acceptance an improved level of maintainability will emerge for SQL Server object creation and modification. This is certainly nothing that has heretofore been defined as a benefit of CLR integration, still the need to have a compiled binary makes it easier to force application developers to use the development process. With SQL Server 2000 and before it been all to easy to crack open the Enterprise manager UI, tweak a stored procedure or modify a view to behave as desired and save the change with not trace of who made the change, what they changed or when it was done.

About the only thing that can be done to control the process today is to restrict access to the production server. I don't know how it works in your shop, but for me there is little chance of adequately restricting server access to prevent undocumented changes to the production systems at the please of any of an uncomfortably large number of folks involved in the development process. In my shop management sees such access as a necessary requirement to our home grown version of rapid development. Rarely a day goes by when there is not evidence that the rapid development has devolved to a build and fix approach even though the heavy risks and high costs of such practices are well evidenced equally as frequently. Instead of continuing to fight the loosing battle and try to hold the high ground I've chosen instead to try to encourage practices that make it easier and safer to stay in the development process model.

For instance, we now have a browser based promotion tool that can pull something out of SourceSafe and move it through the lifecycle environments and in to production in well under a minute. And we have implemented custom T-SQL templates that prevent a table from being dropped if it already exists and includes the permissions for all objects within the script for that object and even created a special class of SourceSafe project know as Conversion Scripts.

Conversion Scripts are intended to serve the purpose of making changes to table structure and perform necessary data manipulations that are an integral part of our rapidly changing rapid development setting - particularly in build and fix mode but can and do server a purpose in in more disciplined iterative development cycles. In order to make Conversion Scripts effective it has become evident to almost everyone that there are many scenarios where the conversion script might need to be re-applied to a server. Often only part of the script can be successfully re-applied and other parts cause corruption or raise an error that prevents the rest of the script from being able to execute when re-applied to a server. To resolve these problems most of the developers have become more and more proficient at a handful of techniques that support non-destructive reuse of the conversion scripts.

At it's base form script re-use in this context requires that the current status of an attribute or object is evaluated against the condition that the conversion script is attempting to create before the script attempts to create a change. Most of use are familiar with the check for existence of an object in sysobjects and then dropping it if it exists that is typical of system generated stored procedure scripts. Almost all other database objects can be checked using the system tables in a similar fashion.

This same approach can be used to see if indexes exist in sysindexes and if the column list and order is as expected in sysindexkeys. As an example consider a script that implements a clustered index on the ProductId column of the NorthWind.dbo.[Order Detail] table. To create this new clustered index we want to first change the existing primary key to be non-clustered and then add the new index. We only want this script to do anything if the a clustered primary key exists on OrderId and ProductId. (Note that while it be less than useful to create this index if other indexes exist on the table that result in possible confusion to the query optimizer, such indexes will not cause the script the fail so we will not try to prevent such a conflict. Trying to guess what might happen in the future is difficult. For our shop the general guidance is simply to allow the the script to be successfully reapplied to the existing structure or after a partially successful previous execution of the script. The goal of script reuse is to allow repeated execution of a script if necessary not to protect the system from poor development strategies in general.) The script to get this done might look like this:

  alter table dbo.[Order Details]
  drop constraint PK_Order_Details

  alter table dbo.[Order Details]
  add constraint PKN_Order_Details__OrderId__ProductID
  primary key nonclustered (OrderId, ProductId)

  create clustered index IXC_Order_Details__ProductId    
  on dbo.[Order Details](ProductId)

The re-usable script to achieve this might look like the following.

use Northwind
if exists (select *
           from sysobjects o
           join dbo.sysindexes i
           on o.id = i.id
           join dbo.sysindexkeys k1
           on i.id = k1.id
           and i.indid = k1.indid
           join dbo.sysindexkeys k2
           on i.id = k2.id
           and i.indid = k2.indid
           join dbo.syscolumns c1
           on k1.id = c1.id 
           and k1.colid = c1.colid
           join dbo.syscolumns c2
           on k2.id = c2.id 
           and k2.colid = c2.colid
           where o.name = 'Order Details'
           and i.name = 'PK_Order_Details'
           and i.indid = 1 -- clustered
           and (select count(*) from dbo.sysindexkeys where id = i.id and indid = i.indid) = 2 -- columns count           and c1.name = ('OrderId') and k1.keyno = 1 -- first index column
           and c2.name = ('ProductId') and k2.keyno = 2) -- second index column

 begin
  alter table dbo.[Order Details]
  drop constraint PK_Order_Details

  alter table dbo.[Order Details]
  add constraint PKN_Order_Details__OrderId__ProductID
  primary key nonclustered (OrderId, ProductId) on [INDEXES]

end

if not exists (select *
               from sysobjects o
               join dbo.sysindexes i
               on o.id = i.id
               join dbo.sysindexkeys k
               on i.id = k.id
               and i.indid = k.indid
               join dbo.syscolumns c
               on k.id = c.id 
               and k.colid = c.colid
               where o.name = 'Order Details'
               and i.name = 'IXC_Order_Details__ProductId'
               and i.indid  = 1 -- clustered
               and (select count(*) from dbo.sysindexkeys where id = i.id and indid = i.indid) = 1 -- column count               and c.name = ('ProductId') and k.keyno = 1) -- index column
 create clustered index IXC_Order_Details__ProductId    
 on dbo.[Order Details](ProductId) on [PRIMARY]

Now when the script is re-executed on a server it will not try to redo the changes and fail. This is critical for our web based automated promotion tool. While references to system tables is less than a good idea in your application code, in a conversion script usage such at we have here, using system tables can be a fast and effective way to write safe and valid code. The problem is there as much more code to type and test in this script than in the the script necessary to produce the base changes. There are a number of T-SQL extensions that can be used to make the task of adding the pre-checking to eliminate errors easier. The most powerful are the meta data function listed below with links to the MSDN based Books Online pages.

COL_LENGTH

fn_listextendedproperty

COL_NAME

FULLTEXTCATALOGPROPERTY

COLUMNPROPERTY

FULLTEXTSERVICEPROPERTY

DATABASEPROPERTY

INDEX_COL

DATABASEPROPERTYEX

INDEXKEY_PROPERTY

DB_ID

INDEXPROPERTY

DB_NAME

OBJECT_ID

FILE_ID

OBJECT_NAME

FILE_NAME

OBJECTPROPERTY

FILEGROUP_ID

@@PROCID

FILEGROUP_NAME

SQL_VARIANT_PROPERTY

FILEGROUPPROPERTY

TYPEPROPERTY

FILEPROPERTY

Using the metadata functions the conversion script can be rewritten as:

use Northwind

if OBJECTPROPERTY(OBJECT_ID('PK_Order_Details','PK'),'CnstIsClustKey') = 1
and INDEX_COL ('dbo.Order Details', 1 , 1 ) = 'OrderId'
and INDEX_COL ('dbo.Order Details', 1 , 2 ) = 'ProductId'
and INDEX_COL ('dbo.Order Details', 1 , 3 ) is null
 begin

  alter table dbo.[Order Details]
  drop constraint PK_Order_Details

  alter table dbo.[Order Details]
  add constraint PKN_Order_Details__OrderId__ProductID
  primary key nonclustered (OrderId, ProductId) on [INDEXES]

end

if OBJECTPROPERTY(OBJECT_ID('Order Details','U'),'TableHasClustIndex') = 0
 create clustered index IXC_Order_Details__ProductId    
 on dbo.[Order Details](ProductId) on [PRIMARY]

Of course the one additional thing I want every developer to do is always build a rollback script at the time they create a conversion script. Short of a rollback script it is important to always think about how a change could be rolled back and include a narrative in the conversion script. That doesn't provide a tested rollback/recovery path but at least doesn't leave the first pass at a recovery attempt until the heat is on. The rollback script is preferable:

use Northwind

if INDEXPROPERTY(OBJECT_ID('Order Details','U'),'IXC_Order_Details__ProductId','IsClustered') is not null
 drop index dbo.[Order Details].IXC_Order_Details__ProductId

if OBJECTPROPERTY(OBJECT_ID('PKN_Order_Details__OrderId__ProductID','PK'),'CnstIsClustKey') = 0
and INDEX_COL ('dbo.Order Details', INDEXPROPERTY(OBJECT_ID('Order Details','U'),'PKN_Order_Details__OrderId__ProductID','IndexId') , 1 ) = 'OrderId'
and INDEX_COL ('dbo.Order Details', INDEXPROPERTY(OBJECT_ID('Order Details','U'),'PKN_Order_Details__OrderId__ProductID','IndexId') , 2 ) = 'ProductId'
and INDEX_COL ('dbo.Order Details', INDEXPROPERTY(OBJECT_ID('Order Details','U'),'PKN_Order_Details__OrderId__ProductID','IndexId') , 3 ) is null
 begin

  alter table dbo.[Order Details]
  drop constraint PKN_Order_Details__OrderId__ProductID

  alter table dbo.[Order Details]
  add constraint PK_Order_Details
  primary key clustered (OrderId, ProductId) on [PRIMARY]

end

Using the meta data function T-SQL extensions and the system tables as a standard practice in all scripts will enable developers to quickly acquire skill and agility in making all scripts re-usable and non-destructive. The benefit to the organization will be safer promotions between lifecycle environments and a greatly reduced risk of something happening that no one meant to happen.

Now the hard part is figuring out an effective way to get your developers to make this small change so that your change management system will work more effectively. Good Luck!

Bill

COL_LENGTH	fn_listextendedproperty
COL_NAME	FULLTEXTCATALOGPROPERTY
COLUMNPROPERTY	FULLTEXTSERVICEPROPERTY
DATABASEPROPERTY	INDEX_COL
DATABASEPROPERTYEX	INDEXKEY_PROPERTY
DB_ID	INDEXPROPERTY
DB_NAME	OBJECT_ID
FILE_ID	OBJECT_NAME
FILE_NAME	OBJECTPROPERTY
FILEGROUP_ID	@@PROCID
FILEGROUP_NAME	SQL_VARIANT_PROPERTY
FILEGROUPPROPERTY	TYPEPROPERTY
FILEPROPERTY