SQL Darwinism: On SQL Server Baselines, Metrics Collection and Trend Analysis
By Bill Wunder

Ernst Mayr will be 100 this year. Professor of Zoology Emeritus at Harvard University, Mayr is sometimes referred to as the Dean of Evolution and has long been a recognized expert in evolutionary biological science. Mayr completed an interesting dissection of Darwin's theories from The Origin of Species  in the 1991 book, One Long Argument. In this book the Professor masterfully describes components of Darwin's evolution. With all due respect to Professor Mayr's scientific knowledge, I'm no biologist so I can't really speak to the nuances of his scientific divergence with Darwin - though in my understanding the differences are significant to the highly trained mind. I do find it particularly insightful to borrow the good Professor's analysis of Darwin's basic premise as a way to understand and explain the change inherent in database applications. With my apologies for any misrepresentations or obfuscations I have taken the liberty of doing a little word substitution into Mayr's itemized account of Darwin's evolutionary theory as it might describe a typical database application on this, the threshold of Mayr's second century. I have also added a few software technology keywords to try to help you conjure up an image of how the aspects of Mayr's critique of Darwinism offers a plausible analytical value in the world of database applications:

  1. Evolution as such. This is the theory that the data application is not constant or recently created nor perpetually cycling, but rather is steadily changing, and that data and structure are transformed in time. keywords: software lifecycle, iterative development, service packs, patches, bugs, scope creep
  2. Common descent. This is the theory that every set descended from a common object, and that all groups of objects, including methods, constants, and even relational tables, ultimately go back to a single binary origin. keywords: the SQL Server 2000 model database, data modeling, reusability, OOP, the wheel (obscure), hierarchical, relational, star schema
  3. Multiplication of species. This theory explains the origin of the enormous data diversity. It postulates that data multiply, either by splitting into daughter sets or by "budding", that is, by the establishment of application specific isolated founder populations that evolve into new sets. keywords: project management, normalization, refactoring, backup/restore, CREATE DATABASE, EDI, XML
  4. Gradualism. According to this theory, evolutionary change takes place through the gradual change of set populations and not by the sudden (saltational) production of new individuals that represent a new type. keywords: insert, update, delete, legacy components, backward compatibility, change control, versioning
  5. Natural selection. According to this theory, evolutionary change comes about through the abundant production of attribute variation in every generation. The relatively few individuals who survive, owing to a particularly well-adapted combination of inheritable characters, give rise to the next generation. (keywords  delete/archive strategy, outsourcing, Internet bust, dBase, MVS, Microsoft, Oracle, MySQL)
  • Hopefully you found the ease with which biological evolution and database technology can be similarly described palatable and at least a little amusing. At another level I believe it is also similarly interesting to explore the advantages Mayr has at his disposal in critiquing Darwin's postulations. From the time Sir Charles sailed around South America to the time Ernst Mayr spent his scholarly career exploring the validity of Darwin's pioneering work a tremendous amount of data had been accumulated. Observations and essays to compare and contrast to the original ideas are now plentiful and represent a huge spectrum of empirical thought. Sure, Darwin had some guidance and others before him had offered bits and pieces of information that he could build upon, but Mayr had much more. He had a clearly defined starting point for the definition of biological evolution and he had a rather extended trend of scientific thought upon which to build a clearer, more accurate description of biological evolution. Simply put, Mayr had a baseline, a collection of data points from which to clearly state how well the original theory performed in the moment and even to predict how the theory might change in the future, and most importantly he followed a sound methodology to reach sound conclusions.

    Likewise, in order to gauge the performance of original specifications for a data application - and we all know how theoretical those original specifications can be - one needs first to establish baselines. In too many cases the baseline is left anecdotal with the result that proactive and predictive analysis is pre-empted by crisis and the path forward is left open to less than well thought out adjustments to the original theory. Just consider how far down a narrow path Darwinism progressed before it received the necessary analysis and adjustments to extract the facts from the fictions - to wit Herbert Spencer's Social Darwinism hogwash. And not until late in the 20th century were adequate corrections advanced for those cogent errors of the late 19th century.  So it is with database components of an application that a baseline coupled with continual analysis and adjustment is key to assuring continual and accurate suitability of the hardware and software components to the application.

    To say that in a different way, there are really two general approaches used to determine when to upgrade or re-architect a SQL Server. I'm not saying that there are only two ways to upgrade. Rather the point I want to make - and expound upon at some length in my next article by the way - is that what causes you to decide to upgrade or fundamentally alter an application database server is either a measured response in a systems evolution or a reaction to a crisis or assumption at a critical point. A SQL Server critical point occurs when hardware or design limitations result in unsatisfactory performance or unacceptable availability. Perhaps the total CPU hovers between 80 and 100% independent of a single run-away process or disk queuing on your storage system results in general customer dissatisfaction in application response time or you run out of tricks to prevent your log device from filling with a resultant interruption of production processing on a regular basis, or you flat run out of disk space for your data set or processor bandwidth for the application load. Conversely, the evolution of a database application is generally gradual and to a large extent predictable. It is common to measure web hits trend up or down or the count of new customers to increase by n number a month. The challenge - from the database capacity and performance planning perspective - is how to equate such high level application growth rates with viable database server load expectations. When well done, a SQL Server critical point can be avoided while at the same time the factual data can be leveraged to result in an efficient and effective upgrade path rather than a potential for SQL Darwinism.

    Allow me to facetiously illustrate where such SQL Darwinism might take an application. First, in keeping to an example consistent with Spencer's imperative that it is our moral duty to allow the weak to fail undeterred, lets say that SQL Darwinism holds that an application be slated for a new platform "when it outgrows" SQL Server (bet you never heard that one before).  Then it would follow that only ANSI SQL be used in that application in spite of the well known truth that ANSI SQL does not assure an easy upgrade path to or from any existing database product. And it could even be that a follower of this SQL Darwinism would be resistant to remedial changes when a growing application demonstrates barely adequate performance on an earlier and/or aging version of a database product, OS, or hardware platform - since implicit in this Darwinism is the expectation that the application will outgrow the platform regardless. So finally the day comes when a poor indexing scheme causes the SQL Server to no longer keep up with the application needs. Fortunately, and predictably in the grandest design of SQL Darwinism, there is almost always an unscrupulous entity - lets call ours "N Ron Consultants" - waiting in the wings to move the struggling application to a new platform at some hefty fee with high associated costs and, if the Darwinist is at all lucky, quietly slip in the necessary indexing corrections as they rewrite the ANSI SQL for the new platform.

    Maybe that little story is a bit far fetched, maybe not. What is most definitely not far fetched is that it's not that difficult to establish baselines and watch the trends on a database server as part of a methodical approach to keeping a successful application successful. In my next article I'm going to explain how you can make use of the SQL Server 2000 sysperfinfo system table and the global statistics available within SQL Server to easily and automatically establish baselines on your servers and to continuously and automatically track changes from those baselines to help you avoid the pitfalls of SQL Darwinism - even if your application is successful.