Chapter 3. Lesson 3: Planning: Estimating software Size

Table of Contents
Readings
Program 3A: Class/Method LOC counter
Report R3: Defect Analysis
Summary

Basic estimation of software size using the PROBE (Proxy-Based Estimation) process

Read chapter 5 of the textbook

Write program 3A using PSP0.1

Write report R3 (Defect Analysis Report)

Readings

After concentrating on information collection, the PSP now begins to move toward analysis and prediction. Chapter 5 of the text deals primarily with size estimation; after covering several popular methods (Wideband-Delphi, "fuzzy-logic", standard-component, and function-point) Humphrey settles on what he terms "Proxy-based estimation" (PROBE). By maintaining a database of components and their sizes from previous projects, the software developer can estimate what components he will need, and use the sizes of old components as proxies to estimate the size of the new components.

Humphrey uses the analogy of home construction to explain PROBE, and it's a decent analogy. By knowing that, for example, a large kitchen generally costs a certain amount per square foot and for certain features, the estimator can estimate the cost of the kitchen in a new house, using past data as as a baseline.

Humphrey holds that these proxies should meet certain criteria, namely that they should:

  1. Relate closely to development effort

  2. Be automatically countable

  3. Be easy to visualize at the beginning of the project

  4. Be customizable to the special needs of using organizations

  5. Be sensitive to implementation variations that impact development cost or effort

After discussing many types of proxies (including objects, screens, files, scripts, and document chapters, Humphrey settles on "objects" as proxies for software size estimation, and LOC as a measure of size. I find this a curious use of the term "object", as almost all object-oriented texts I've read use the term "class" to denote the abstract entity that Humphrey refers to, and "object" to denote the more concrete instantiation of a class. In this study, I will use the term "class" to denote these proxies instead.

It's worthy to note that the concept of proxy-based estimation lends itself to many other fields; I can easily see a computer artist, for example, using past 3d models and appropriate metrics (vertex count, texture size, etc) as a basis for prediction of future models.

To use classes as proxies, Humphrey divides class data into categories (control, data, display, logic, formatting, etc.) and size ranges for each; this is somewhat analagous to the builder from our earlier analogy dividing rooms into categories (bathroom, kitchen, study...) and size ranges as well.

And here Humphrey does something curious, and it remains to be seen how well this works: he normalizes LOC count by method for classes. Rather than saying that a very large logic class runs about 1000 LOC, one might say that a very large logic class runs about 90.27 LOC/method.

I have a few problems with this. First, it conflates the concepts of size and complexity a bit. A poor design might place all the functionality of a given class into a single method, making it a very "large" (LOC/method) class; a far better design would factor out relevant functionality into separate methods, making it a very "small" class.

Second, I no longer really understand what is meant by "size"-- assuming a perfect refactoring of a class (such that no extra LOC is added, but one large method is broken into several smaller ones), total LOC has not changed, but the class has, according to this classification, gotten smaller.

Third, as a personal measure I try to refactor mercilessly; as mentioned in the coding standards above, I prefer methods to be very small, topping out at around 35-40 LOC; this may seem to be overmodularization, but I find it leaves things in small, understandable chunks. This sort of approach, while making for very digestible program chunks, may well create very similar LOC/Method counts for classes.

On the other hand, it's better than no method at all, and will be worth a look in future chapters, once I have enough data to begin prediction. The use of linear regression to relate estimates to actual sizes, helping in accurate prediction, and the use of the normal distribution to create a prediction interval both give PROBE a fair amount of statistical validity, which is leaps and bounds ahead of anything I've tried so far.