The Personal Software Process: an Independent Study
Prev	Chapter 12. Lesson 12: Design Verification	Next

Report R5: Final Report

Overview: Produce a report on what you have learned from doing the exercises in this book. The assignment is to provide you with a thorough understanding of your current software development performance and your highest-priority areas for improvement. Update the process you used to develop the midterm report and use this updated process to produce this final report.
Task 1-- Update the process you developed for the midterm report. Submit an updated copy of the process and note the changes you made and why. Note particularly if you used PIPs and what changes you made as a result.
Task 2-- Plan and enact the process defined in task 1 to do the work defined in task 3. Use the planning form to record the planned time for this work and track and record the actual time you spend. Submit the planned and actual data on the process together with the final report.
Task 3-- Analyze the data for the programs you have developed with the PSP. Spreadsheet analyses ar esuggested and graphical summaries and presentations are encouraged. At a minimum, produce the following:
Analyze your size-estimating accuracy and determine the degree to which your estimates were within the 70 percent and 90 percent statistical prediction intervals. Also show how your size-estimating accuracy evolved during the assignments.
Analyze your time-estimating accuracy and determine the degree to which your estimates were with in the 70-percent and 90-percent statistical prediction intervals. Also show how your time-estimating accuracy evolved during the assignments.
Analyze the types of defects you injected in design and in coding. Include a Pareto analysis of these defect types.
Determine your trends for defects per KLOC found in design review, code review, compile, and test. Also, show your trends for total defects per KLOC throughout this course.
Analyze your defect-removal rates for design reviews, code reviews, compile, and test and show the defect-removal leverage for design reviews, code reviews, and compile versus unit test. In those cases in which you had no test defects, use the average unit test defect-removal rate for the programs developed to date.
Produce an analysis of yield versus LOC reviewed per hour in your code reviews.
Produce an analysis of design-review yield versus LOC reviewed per hour. Note that design-review yield is calculated as follows: Yield(DR) = 100 * (defects removed in design review)/(defects removed in design review + defects escaping design review).
Produce an analysis of the yield versus the A/FR ratio for programs 7 through 10.
Produce a brief write-up describing your highest-priority areas for personal process improvement and why. Briefly summarize your current performance, your desired future performance, and your improvement goals. Describe how and roughly when you intend to meet these goals.
Result: Submit the required analyses and a brief written report describing your findings and conclusions. Use graphs wherever possible. Note particularly how you will use these results to manage and improve your software and other work in the future.

--[Humphrey95]
Note: since I did not calculate 90% prediction intervals for any programs, I'll only be showing data for 70% intervals. In the event that a program did not have any prediction interval calculated, I will use no prediction interval. Since I do not plan on reporting data in this fashion again (and since I am adding a report for comparison of C++ to Eiffel), I will not take data on resources used for this report.

Process

To create the script for R5, I'll use the script from R4, modified heavily to include the new elements from R5. I'll also try and compare my performance to that of other PSP students, using data provided by the Software Engineering Institute. Needless to say, this script is much more "freeform" than the previous script, because I'm doing a bit of research with GnuPlot as well, and it is tedious to repeat myself when part of the work is very repetitious (build a file to generate a graph for loc error vs program number. It should look like this... Now do the same thing for time error vs program number...

Table 12-1. Report R5 Development Script

Phase Number Purpose To guide the reporting of PSP data

Entry Criteria

Requirements statement
Time and defect logs in .ppl format
Project summary logs for all programs, listing LOC estimates and actual LOC.
Tools for analysis and graphing (evalpplog and gnuplot)
For gnuplot, gnuplot scripts for all appropriate graphs.

Planning

Verify you have the data necessary to fulfil the requirements
Make an estimate of the time necessary to produce the required report
Complete the time recording log

Data Collation

Create brief tabular forms (by hand if necessary) to summarize the following data:
- Size estimates: program # vs new/changed LOC estimate, actual new/changed LOC, 70% LPI, 70% UPI, 90% LPI, 90% UPI, % error
- Time estimates: program # time estimate, actual time, 70% LPI, 70% UPI, 90% LPI, 90% UPI, % error
- Defect classification: type of defect vs % injected in design, % injected in code
- Defects/KLOC: program # vs defects/kloc in design review, code review, compile, test, total.
- Defect removal rates: program# vs defect removal rates (defects/hr) in design review, code review, compile, test, and DRL rates for DR/test, CR/test, and compile/test
- Yield: program# vs yield and loc reviewed/hr
- Yield: program# vs design review yield, loc reviewed/hr
- Yield vs A/FR: program# vs yield, A/FR
- Loc production: program# vs LOC/hr production
Enter data from forms into files size_estimation.dat, time_estimation.dat, defects_kloc.dat, defect_removal.dat, yield.dat, dr_yield.dat, and loc_hr.dat; enter the data from the form in space-delimited format, using "#" to begin comment rows, ie
#size_estimation.dat: size estimates in the following format: # prog# my_estimate probe_estimate actual 70_lpi 70_upi 90_lpi 90_upi %error 1 45 50 100 50 150 30 170 100
use gnuplot functions to calculate error when available.
Create gnuplot generation files for each desired graph; label them appropriately (such as generate_size_estimation_graphs.gnu

Data analysis

Create makefile targets to generate the necessary graphs.
Use evalpplog to gather and collate time and defect log statistics. The evalpplog tool can collate data from multiple logs. If your logs are all named "cpp*ppl", and are located under the current directory tree somewhere, the command line find . -name "cpp*ppl" | xargs evalpplog will analyze all the logs. Use this data to form new targets and graph generation scripts for defect data.
Generate the graphs by executing the make targets.

Report generation

Using the graphs from phase 3 and the historical data, write an analysis of each area and how it has evolved.

5 Postmortem Complete a report summary with actual time data

Exit criteria

Completed report summary with actual and estimated time data.
Completed time logs.
Reports for all areas identified in the requirements.

Planning

This is tricky. I'm doing research here (learning to play with GnuPlot), so any estimates I might have are shaky at best. I'll take a very rough guess indeed that this will take about 1.5 hours to collate the data and make graphs. I may rue this.

Report

Size and Time Estimation

PROBE-Estimated size vs actual size for all programs

%error: PROBE-Estimated size

%error: Rough-estimation of size

Size estimates by program number

As one can see by the above graphs, my size estimations can use a great deal of work; while the size estimates by program number (showing some strange figures for the 70% prediction interval, particularly on program 6!) shows a fairly decent correlation between my PROBE predictions (red line) and the actual result (purple line), the PROBE percent-error graph shows a slightly different story: 100% estimation error on program 7, and variations from over 100% to almost -40%. This is not terrible, but I could have hoped for better over the course of the 10 programs. Still, that the estimates have been within about 100% is cause for encouragement, giving me some hope that it can be done. And some figures from students in the book, with estimation errors of up to 300% for program 10, give me cause for hope as well. This can be done, and I plan on getting better at it.

Unfortunately, I'm not sure how to get better except by practice; I do plan on integrating the PSP logging practices into my regular development, aided by pplog-mode.el, which makes such logging significantly easier.

Time Estimation

Time estimates vs actual time taken

Time estimates by program number

Time estimation error

The time estimation story is somewhat more optimistic than the size estimation story, which is heartening; my errors tended to be less than 30%, and while it's hard to see a trend of increasing accuracy as time went on, I can at least see more of a correlation between my estimated and actual time.

Defects

Table 12-2. Defects by number

Type	Count
sy	98
wn	45
wa	34
md	25
ic	23
ct	20
wt	13
iu	13
ma	12
wc	7
we	5
mc	4
id	2
ch	1
is	1
me	1
mi	1

Table 12-3. Defects-- total duration

Type	Total Duration (minutes)
wa	236
md	82
ic	76
sy	52
iu	49
ma	41
wn	40
wt	34
we	31
wc	23
me	20
ct	17
id	2
ch	1
is	1
mc	1
mi	1

This is fairly revealing-- using the wrong algorithm, or having problems with the intended algorithm-- takes up the lion's share of the defect-fixing time. This is remarkable, and reflects serious problems with design-- or with transcribing the design. I can think of a few ways of fixing this. Round-trip tools, which implement a design as code, might help in some ways, but I think a better focus on design reviews would be more cost-effective and universally useful. Tailing behind wrong-algorithm problems are missing design (same fixes), and interface capability (in some ways, the same fixes, although IC errors also referred to a few instances of the standard libraries' missing capabilities).

Next down is syntax-- 52 minutes (of about 712 minutes spent just fixing errors) was devoted to syntax errors; add that to the 40 minutes of "wrong name" errors, and we can see a fair amount of time devoted to fixing sloppy typing and simple problems. Improvements in the coding process, as well as stricter code reviews, might help this considerably.

Close in the running is "interface use", which in this case meant I was using a class's interface in the wrong manner, or misunderstanding what it actually does. It's interesting that IU errors counted for 49 minutes worth of C++ defects, but only 0.82 minutes of Eiffel defects, and I place the source of that squarely on the shoulders of the Eiffel automatic documentation facility, where it's possible to make HTML documentation for a class based on in-class comments and the pre/post conditions of the class. By increasing my knowledge of the C++ standard library (or by switching to Eiffel!) I could probably reduce the time spent using interfaces incorrectly.

Defects/KLOC removed by phase

I found it interesting that my total defects/KLOC did not really decrease over time, hovering around 150. This fact is depressing, but I did find the definite decrease in testing defects/KLOC to be a very encouraging sign, showing that the addition of design and code reviews may be having a positive effect. Unfortunately, I still see too many defects detected in compile, a trend I'd like to see improved.

Defect removal rate by phase

Defect removal leverage by program number

The study closed with code reviews improving in effectiveness and design reviews staying about the same with regard to defect removal leverage. I'd like to see the defect removal rates (and thus leverage) improve considerably with my reviews. My review skills are pretty bad right now; constant tweaking of the review checklists for common problems may help.

Something else that may help is a change in my review process. Right now I'm trying to review entire sections of source code at one time, and the review process is stultifyingly dull. I can see why Extreme Programming proponents prefer pair programming (constant review/inspection) as opposed to larger inspections of volumes of source at a time. I'd also like to start using more formal design review processes for methods that look tricky; I think that would have helped quite a bit with program 10a, where my design review did not pick up very many errors. I've already started research on more effective inspections (a look about found a couple of good books and articles on the subject). I strongly wish to increase my skill in reviews, particularly design reviews. Design skill is difficult to learn and inspect, but I feel it's particularly important in software.

Pre-compile Yield vs LOC reviewed

Design Review Yield vs LOC reviewed

Pre-compile Yield vs A/FR ratio

To be honest, these graphs don't do a whole lot for me, although the information is probably useful. What I do find interesting, particularly in the "yield vs LOC reviewed" graphs, is that inspections seem to get more effective the bigger programs are. That's tremendous news--finding a tool that actually does better as programs get bigger. Very encouraging indeed!

Comparison with PSP students

Defects/KLOC by program

Defects/KLOC by program -- PSP students

This looks like terrible news-- my defects/KLOC trend is high (averaging around 150-160 defects/KLOC), and if anything was getting worse by the end of the class instead of better! I'm trying not to take too much from this, because I don't know how they're getting their numbers, and if their numbers are reflecting total detected defects or not. Another possibility (unlikely) is that I generated similar numbers of defects using less code per program, but I find that difficult to believe. What it does encourage me to do is to try and improve my upstream practices, generating better designs and code the first time around rather than just slopping ahead and trusting inspections, compilation, and testing to find my problems.

Defects/KLOC in Compile by program

Defects/KLOC in Compile by program -- PSP students

Similar troubles here. I should definately attempt to reduce the number of defects which make it to the compile stage, probably by upping my efforts at inspection, review, and design. Better coding practices will help as well, with an emphasis on prevention rather than detection.

Defects/KLOC in Test by program

Defects/KLOC in Test by program -- PSP students

This story is somewhat more heartening. My data is still not wonderful, but the number of defects making it to test did seem to be decreasing toward the end of the course, and are roughly comparable to the PSP students in general. This much is good. I'd like it to be better.

Design/Code review yield by program number

Design/Code review Yield by program number -- PSP students

If anything shows the ineffectiveness of my reviews, it's this-- look at the remarkable difference between my pre-compile yield (averaging around 20-25 percent) and that of the PSP class average from the SEI-- about 50% once reviews are initiated! Wow-- my performance here is, to me, unacceptable; knowing that about the same amount of defects are making it to test, but so few are being caught early means that I'm placing a lot of reliance on the compiler to catch errors that I should be catching much earlier. I'll definately be reading up on reviews and decreasing the granularity of my review process (covering a routine at a time, doing more in-depth analysis of a routine, etc). One of the biggest items learned from my study of the PSP has been the worth of reviews and inspections; I really must increase both my knowledge of the inspection process and my ability to design code and designs which are easily inspected. Given the average class performance of about 50% yield, I think that should be my first "metric-based" goal-- increasing the performance of my pre-compile yield to 50%.

Appraisal-to-failure ratio

Appraisal-to-failure ratio -- PSP Students

Here once again, we can see where some of the problem lies-- I'm not spending as much time as the average PSP student on appraisal methods (ie reviews). For program 10 (a worst-case example for me) the students were spending almost twice as much time in reviews as they were in test; I was spending almost half the time in reviews as I was in test! Frankly, this is somewhat embarassing, showing that (all other things being equal) the students averaged four times the time in reviews that I did! Ouch. I did somewhat better on some of the earlier programs (program 8, for example, where the students' A/FR ratio was similar to mine). In any case, I think I should be spending more time in reviews, a conclusion I'd reached earlier by other means.

Conclusions

The PSP has taught me a great deal about my software development processes (or lack of same!), merely by virtue of having me measure my processes. I have done so... and found them wanting somewhat.

This is a bit distressing-- but only initially. I now have the tools to focus on improving my processes, and the metrics necessary to verify my improvements. I can think of a number of things to try, but my highest priorities are as follows:

Improve my design representation. My review yields are low. To improve them, I must improve my ability to review designs; one way to do this is to improve my design representation. I can do this partially by adopting some of the processes and representations described by Humphrey, such as logic and state specification templates, but I still dislike those; they are excellent for formally describing behaviour, but poor for understanding the described behaviour. Graphical notations such as UML and BON may help (and a tool which could convert from a design notation into code, and back, would be worthwhile). I can see a way to implement such generation in Dia using BON and Eiffel, and may do so in the future. I do think that the use of contracts (require/ensure assertions) has helped considerably, and will continue to do so.
Increase time spent in reviews. The comparison of my A/FR ratio shows that I am spending too much time in the compile and test phase; I can improve this by either decreasing the time spent in compile/test (either by shortchanging quality-- not acceptable-- or by improving yield so that compile/test takes less time) or by increasing the time spent in reviews (which should increase yield, improving A/FR even further!). I will choose to inspect more, hopefully improving yield in the process.
Increase review yield. This is really the goal. I'm setting an initial target of design/code review yield at 50%, but I'd like to see this increase considerably over time, up to the 75-80 percent range. I can do this by increasing the time spent on review, adding more formal review processes (now that I better understand the processes described in the text, I can use them more effectively) and by doing more effective reviews in general, by increasing my knowledge of inspections and their execution (I have started this process by identifying relevant articles and books on the subject, which will hopefully increase my inspections' effectiveness).

These three items, all of which contribute toward the effectiveness of reviews, represent my initial focus after this study. It's interesting to note that none of the above items focus on estimation, and that's worth an explanation. I'm not focusing on estimation for two reasons. First, in my current development environment, estimation is not as high a priority as quality-- my schedule is extremely flexible. Second, I can't think of anything I can do to increase my estimation accuracy except practice, which will come by the execution of the PSP as time goes on. Therefore, my focus will remain on design, design representation, and design/code inspection.