**Table of Contents**- Readings
- Program 5A: Integrator
- Summary

Extend the size estimation practices established in the last two lessons toward estimation of resource and schedule usage

Read chapter 6 of the textbook

Write program 5A using PSP1.1

This chapter, on estimating development time, is the one I rather wish
I'd read before the *last* assignment-- it relates
the linear regression process to prediction and estimation, and while I'd
figured much of the information out by the end of lesson 4, it would have
been nice to have it earlier.

In a nutshell, the estimation part of the PSP tries to relate a set of historical estimated data (past estimates) to a set of actual data (past actual results), then use mathematics to make a prediction for a new estimate. Chapter 5 focuses on development time, so we'll use that as an example, but the processes can be used to relate any two quantities which might be correlated (and further lessons will evidently give us tools to determine that correlation). Humphrey gives three scenarios for estimating project time:

If you don't have enough historical data (estimates and results) to make a prediction for size, you take a known quantity (historical productivity in LOC/hour), and use that to estimate the shortest and longest likely times; in other words, using an example from the text, if you had written two programs and had time and size data, you might get the following information:

**Table 5-1. Example: figuring estimated time from productivity data**

LOC | Hours | Loc/Hour | |

172 | 7.6 | 22.63 | |

242 | 15.3 | 15.82 | |

Total: | 414 | 22.9 | 18.07 |

Using an average (18.07 LOC/hr), you might guess that your development time for a 156 LOC program might be 8.63 hours; using your lowest and highest productivities, you could guess a max time of 9.86 and 6.9 hours for the program. You now have a most likely schedule and an estimated lowest or highest.

If you have at least three sets of data, however, Humphrey advocates
the use of statistics, particularly the fairly simple (if arduous by hand)
linear regression calculation. Essentially, this takes pairs of numbers (estimated
LOC and actual development hours, etc) as X-Y coordinates, finds the best-fit
line for the data, and uses that line to extrapolate a schedule. This is a good
deal more accurate, because rather than running on productivity, it gives you
a statistical prediction of estimated-size-to-schedule based on historical data;
the first used actual productivity numbers on an estimated size, ignoring
the possibility of errors in your size estimate. The linear regression method
assumes that, if you do have errors, they will be consistent, and will incorporate
that data into the prediction. A likely "envelope" (prediction interval) around
your most likely schedule is derived using the *t*-distribution
and more clever algorithms. The entire process is *much*
more math-intensive than the simple productivity calculation, but once automated
would be just as easy and much more accurate. It can also be used to relate
many types of correlated variables-- such as comparisons between Eiffel and C++
source code size and development time.

The linear regression calculation is fairly simple, but very time-consuming
if done by hand (a great deal of summation, etc). A tool such as program 4A could
be helpful indeed (in fact, it takes a great deal of restraint not to enhance it
for other parameters *now* instead of waiting for the proper
assignments!).

The rest of the chapter involves more mathematical concepts: how to combine resource estimates to get both an estimated result and an estimated prediction interval (in which we discover that combining multiple estimates gives a smaller prediction interval than a single estimate), and how to create a large estimate out of many smaller estimates. He also introduces multiple regression, a process which allows one to estimate the relative contributions of different variables on a single outcome (here, the relative contributions of the work on new, reused, and modified code [Humphrey95]). The math for this looks formidable and, of course, suitable for automation.

Given a time estimate, the remainder of chapter 5 is devoted to
creating a schedule from the time estimate: identifying possible working
hours, allocating project hours to work hours, creating the schedule, etc.
Humphrey introduces the concept of *earned value tracking*
to track the progress of a project; essentially, EV tracking
assigns a value to each step in a project based on its estimated work time
as a percentage of the estimated total work time. Adding tasks, then, reduces the
estimated value of tasks already accounted for, but this does produce a
fairly decent way to measure progress (a more traditional approach, the use of
miniature milestones, accomplishes much the same thing without the added concept
of value; any one milestone is often the same as any other, without a sense
of additional value or progress for difficult or time-consuming tasks).