The Personal Software Process: an Independent Study
Prev	Chapter 4. Lesson 4: Planning: Estimating software Size, continued	Next

Program 4A: Linear Regression Analysis

Using a linked list, write a program to calculate the mean and standard deviation of a set of data

Given Requirements

Requirements:Write a program to calculate the linear regression size-estimating parameters for a set of n programs where historical object LOC and new and changed LOC are available. Linear regression and the required formulas are explained in section A7. Enhance the linked list of program 1A to hold the n data records, where each record holds two real numbers

Testing:Thoroughly test the program. At a minimum, use this program to calculate the beta parameters for three cases [taken from table D8 in the text and from programs 2A, 3A, 4A]... Prepare a report of your tests that includes a table of the planned and actual results from these tests in the format in table D9

--[Humphrey95]

Table 4-1. Test Results Format-- Program 4A

Test	Expected B₀	Expected B₁	Actual B₀	Actual B₁
Table D8: Estimated Class vs Actual New and Changed LOC	-22.55	1.7279
Table D8: Estimated New and Changed LOC vs Actual New and Changed LOC	-23.92	1.4310
Program 2A: Estimated New and Changed LOC vs Actual New and Changed LOC
Program 3A: Estimated New and Changed LOC vs Actual New and Changed LOC
Program 4A: Estimated New and Changed LOC vs Actual New and Changed LOC

Planning

My goodness! PSP1 comes complete with a very elaborate script for calculating linear regression parameters and evolving a complex size and time estimate. It's obvious that program 3a is intended to help, but it comes too late-- I wasted a great deal of time calculating the linear regression parameters by hand, and simply made an educated guess on the time estimate rather than running through the process again. This portion needs tools in a big way. I actually recommend to anyone learning the PSP that they get these tools ahead of time. I commend Humphrey on making tool construction part of the syllabus, but the tools are coming after they are needed at least once. Perhaps this is done to impress the student with the necessity of automation, but it has the unfortunate effect of artificially inflating the time spent in the planning phase (if the planning phase is incredibly long on one assignment because of hand calculations and shortened dramatically on the next due to automation, it will skew things a bit!).

I'm planning on reusing the number_list and simple_input_parser classes from program 1A; through the estimation script (which I think is artificially inflating estimates from programs 2A and 3A, which ended up significantly larger than expected), I'm estimating (through the PROBE process) 143 new LOC, and (through analogy and guessing) about 2 hours of development time.

Development

Design

The design will differ very slightly from that evidenced by the conceptual design used in planning. I will indeed reuse the number_list class from lesson 1, but will delete i/o functionality from it, creating a paired_number_list class, which holds two number_lists. I'm removing the I/O in order to refactor it into a new class, and taking two lists of single numbers as it will work better than adapting the original class to hold pairs of numbers, because the original mean() functionality will be easily encapsulated (and we use the means of each runs in the linear regression calculations).

The i/o will be brought in by reusing the simple_input_parser class from programs 2A and 3A. A third class will mix the two using multiple inheritance (paired_number_list_reader), interpreting an i/o stream as a set of comma-separated value lines in the form of x,y. The main program will require very minor changes (new instantiation, new printing instructions).

Code

Not too bad; this was a fairly straightforward adaptation of the design. I did discover that I'd been doing the beta calculations for my estimate incorrectly; now, of course, I have a tool to automate this for next time.

simple_input_parser.h, simple_input_parser.cpp

The simple_input_parser class was used unchanged from programs 2A and 3A.

number_list

number_list was used almost whole, but with the removal of the i/o features.

/*
*/

#ifndef NUMBER_LIST_H
#define NUMBER_LIST_H

#include <list>
#include <iostream>

//a class which encapsulates a list of double values, adding the features of
//mean and standard deviation
class number_list : public list< double >
{
 public:
  double sum( void ) const;
  double mean( void ) const;
  double standard_deviation( void ) const;
  int entry_count( void ) const;
  void add_entry( double new_entry );
};

#endif

/*
*/

/*
*/

#include "number_list.h"
#include <assert.h>
#include <stdlib.h>
#include <math.h>


double 
number_list::sum( void ) const
{
  double Result = 0;
  for ( list<double>::const_iterator iter = begin();
	iter != end();
	++iter )
    {
      Result += *iter;
    }
  return Result;
}

double 
number_list::mean( void ) const
{
  assert( entry_count() > 0 );
  return sum() / entry_count();
}

double 
number_list::standard_deviation( void ) const
{
  assert( entry_count() > 1 );
  double sum_of_square_differences = 0;
  for ( list<double>::const_iterator iter = begin();
	iter != end();
	++iter )
    {
      const double this_square_difference = *iter - mean();
      sum_of_square_differences += this_square_difference * this_square_difference;
    }
  return sqrt( sum_of_square_differences / ( entry_count() - 1 ));
}

int 
number_list::entry_count( void ) const
{
  return size();
}

void 
number_list::add_entry( double new_entry )
{
  push_back( new_entry );
}

/*
*/

paired_number_list

/*
*/

#ifndef PAIRED_NUMBER_LIST_H
#define PAIRED_NUMBER_LIST_H

#ifndef NUMBER_LIST_H
#include "number_list.h"
#endif

class paired_number_list : public number_list
{
 public:
  //modifiers
  void add_entry( double x, double y );
  void reset( void );
  paired_number_list( void );
  //number of entries
  int entry_count( void ) const;

  double x_sum( void ) const;
  double x_mean( void ) const;
  double x_standard_deviation( void ) const;

  double y_sum( void ) const;
  double y_mean( void ) const;
  double y_standard_deviation( void ) const;

  double beta_1_numerator( void ) const;
  double beta_1_denominator( void ) const;
  double beta_1( void ) const;
  double beta_0( void ) const;
 protected:
  number_list m_xs;
  number_list m_ys;
};


#endif
/*
*/

/*
*/

#include "paired_number_list.h"
#ifndef CONTRACT_H
#include "contract.h"
#endif

void
paired_number_list::reset( void )
{
  m_xs.clear();
  m_ys.clear();
}

paired_number_list::paired_number_list( void ) 
{
  reset();
}


int
paired_number_list::entry_count( void ) const
{
  REQUIRE( m_xs.entry_count() == m_ys.entry_count() );
  return m_xs.entry_count();
}

void
paired_number_list::add_entry( double x, double y )
{
  m_xs.add_entry( x );
  m_ys.add_entry( y );
}

double
paired_number_list::x_sum( void ) const
{
  return m_xs.sum();
}

double
paired_number_list::x_mean( void ) const
{
  return m_xs.mean();
}

double
paired_number_list::x_standard_deviation( void ) const
{
  return m_xs.standard_deviation();
}

double
paired_number_list::y_sum( void ) const
{
  return m_ys.sum();
}

double
paired_number_list::y_mean( void ) const
{
  return m_ys.mean();
}

double
paired_number_list::y_standard_deviation( void ) const
{
  return m_ys.standard_deviation();
}

double
paired_number_list::beta_1_numerator( void ) const
{
  double Result = 0;
  list<double>::const_iterator x_iter;
  list<double>::const_iterator y_iter;

  for( x_iter = m_xs.begin(), y_iter = m_ys.begin();
       (x_iter != m_xs.end()) && (y_iter != m_ys.end());
       ++x_iter, ++y_iter )
    {
      Result += (*x_iter)*(*y_iter);
    }
  Result -= static_cast<double>(entry_count())*x_mean()*y_mean();
  return Result;
}

double
paired_number_list::beta_1_denominator( void ) const
{
  double Result = 0;
  list<double>::const_iterator x_iter;
  list<double>::const_iterator y_iter;
  for( x_iter = m_xs.begin(), y_iter = m_ys.begin();
       (x_iter != m_xs.end()) && (y_iter != m_ys.end());
       ++x_iter, ++y_iter )
    {
      Result += (*x_iter)*(*x_iter);
    }
  Result -=  static_cast<double>(entry_count())*x_mean()*x_mean();
  return Result;
}

double
paired_number_list::beta_1( void ) const
{
  return beta_1_numerator() / beta_1_denominator();
}

double
paired_number_list::beta_0( void ) const
{
  return y_mean() - beta_1() * x_mean();
}

/*
*/

readable_paired_number_list

/*
*/

#ifndef READABLE_PAIRED_NUMBER_LIST_H
#define READABLE_PAIRED_NUMBER_LIST_H

#ifndef PAIRED_NUMBER_LIST_H
#include "paired_number_list.h"
#endif
#ifndef SIMPLE_INPUT_PARSER_H
#include "simple_input_parser.h"
#endif

class readable_paired_number_list : public paired_number_list, public simple_input_parser
{
 public:
  virtual void parse_last_line( void );
  virtual void reset( void );
  void clear_error_flag( void );
  void check_error( bool condition, const std::string& message );
  void report_error( const std::string& message );
  readable_paired_number_list::readable_paired_number_list( void );
 protected:
  int line_number;
  bool error_found;
};

#endif
/*
*/

/*
*/


#include "readable_paired_number_list.h"

void
readable_paired_number_list::parse_last_line( void )
{
  clear_error_flag();
  //split the string around the comma
  std::string::size_type comma_index = last_line().find( ',' );
  check_error( comma_index == last_line().npos, "No comma" );
  std::string x_string = last_line().substr( 0, comma_index );
  std::string y_string = last_line().substr( comma_index + 1, last_line().length() );
  //get values for each double and ensure they're valid
  char* conversion_end = NULL;
  double new_x = strtod( x_string.c_str(), &conversion_end );
  check_error( conversion_end == x_string.data(), "X invalid" );
  double new_y = strtod( y_string.c_str(), &conversion_end );
  check_error( conversion_end == y_string.data(), "Y invalid" );
  //add the entry
  if ( ! error_found ) 
    {
      cout << "added: " << new_x << ", " << new_y << "\n";
      add_entry( new_x, new_y );
    }
}

void
readable_paired_number_list::clear_error_flag( void ) 
{
  error_found = false;
}

void
readable_paired_number_list::reset( void ) 
{
  line_number = 0;
  clear_error_flag();
}

readable_paired_number_list::readable_paired_number_list( void ) :
paired_number_list(),
simple_input_parser()
{
  reset();
}

void
readable_paired_number_list::check_error( bool condition, const std::string& message )
{
  if ( condition ) 
    {
      error_found = true;
      report_error( message );
    }
}

void
readable_paired_number_list::report_error( const std::string& message )
{
  cerr << "input:" << line_number << ":" << message << ":" << last_line() << "\n";
}

/*
*/

main.cpp

/*
*/

#include <fstream>
#include <iostream>
#include "string.h"

#ifndef READABLE_PAIRED_NUMBER_LIST_H
#include "readable_paired_number_list.h"
#endif

istream *
input_stream_from_args( int arg_count, const char** arg_vector )
{
  istream* Result = NULL;
  if ( arg_count == 1 )
    {
      Result = &cin;
    }
  else 
    {
      const char* help_text = "\
PSP exercise 4A: Calculate the linear regression parameters from
a set of comma-separated values, from standard input.
Usage: \n \
\tpsp_1a\n\n";
      cout << help_text;
    }
  return Result;
}

int main( int arg_count, const char** arg_vector )
{
  //get the input stream, or print the help text as appropriate
  istream* input_stream = input_stream_from_args( arg_count, arg_vector );
  if ( input_stream != NULL ) 
    {
      //read the entries from the input stream
      readable_paired_number_list a_list;
      a_list.set_input_stream( input_stream );
      a_list.parse_until_eof();
      cout << a_list.entry_count() << " entries processed.\n";
      //output the mean, as appropriate
      if ( a_list.entry_count() > 0 )
	{
	  cout << "Beta 0: " << a_list.beta_0() << "\n";
	  cout << "Beta 1: " << a_list.beta_1() << "\n";
	}
      else
	{
	  cout << "Too few entries to calculate\n";
	}
    }
}

/*
*/

number_list.e

class NUMBER_LIST

inherit 
   LINKED_LIST[DOUBLE];
   
creation {ANY} 
   make

feature {ANY} 
   
   line_counter: INTEGER;
   
   sum: DOUBLE is 
      local 
         i: INTEGER;
      do  
         from 
            i := lower;
         until 
            not valid_index(i)
         loop 
            Result := Result + item(i);
            i := i + 1;
         end; 
      end -- sum
   
   mean: DOUBLE is 
      require 
         count > 0; 
      do  
         Result := sum / count.to_double;
      end -- mean
   
   standard_deviation: DOUBLE is 
      require 
         count > 1; 
      local 
         i: INTEGER;
         this_term_difference: DOUBLE;
         top_term: DOUBLE;
      do  
         from 
            i := lower;
         until 
            not valid_index(i)
         loop 
            this_term_difference := item(i) - mean;
            top_term := top_term + this_term_difference ^ 2;
            i := i + 1;
         end; 
         Result := (top_term / (count - 1)).sqrt;
      end -- standard_deviation

end -- NUMBER_LIST

paired_number_list.e

class PAIRED_NUMBER_LIST
   
creation
   make

feature { NONE }
   
   xs : NUMBER_LIST
   
   ys : NUMBER_LIST
   
feature {ANY}
   
   add_entry( new_x, new_y : DOUBLE ) is
      do
	 xs.add_last( new_x )
	 ys.add_last( new_y )
	 std_output.put_double( new_x )
	 std_output.put_string( ", " )
	 std_output.put_double( new_y )
	 std_output.put_string( "%N" )
      end
   
   count : INTEGER is
      do
	 Result := xs.count
      end
   
   x_sum : DOUBLE is
      do
	 Result := xs.sum
      end
   
   x_mean : DOUBLE is
      do
	 Result := xs.mean
      end
   
   x_standard_deviation : DOUBLE is
      do
	 Result := xs.standard_deviation
      end
   
   y_sum : DOUBLE is
      do
	 Result := ys.sum
      end
   
   y_mean : DOUBLE is
      do
	 Result := ys.mean
      end
   
   y_standard_deviation : DOUBLE is
      do
	 Result := ys.standard_deviation
      end
   
   entry_count : INTEGER is
      do
	 Result := xs.count
      end
   
   beta_1_numerator : DOUBLE is
      local
	 index : INTEGER
      do
	 Result := 0
	 from
	    index := xs.lower
	 until
	    not ( xs.valid_index( index ) and ys.valid_index( index ) )
	 loop
	    Result := Result + xs.item(index) * ys.item( index )
	    index := index + 1
	 end
	 Result := Result - entry_count.to_double * x_mean * y_mean
      end
   
   beta_1_denominator : DOUBLE is
      local
	 index : INTEGER
      do
	 Result := 0
	 from
	    index := xs.lower
	 until
	    not ( xs.valid_index( index ) and ys.valid_index( index ) )
	 loop
	    Result := Result + xs.item(index) * xs.item( index )
	    index := index + 1
	 end
	 Result := Result - entry_count.to_double * x_mean * x_mean
      end
   
   beta_1 : DOUBLE is
      do
	 Result := beta_1_numerator / beta_1_denominator
      end
   
   beta_0 : DOUBLE is
      do
	 Result := y_mean - beta_1 * x_mean
      end
   
   make is
      do
	 !!xs.make
	 !!ys.make
      end
   
   reset is
      do
	 xs.clear
	 ys.clear
      end
   
invariant
   
   xs.count = ys.count
   
end -- PAIRED_NUMBER_LIST

readable_paired_number_list.e

class READABLE_PAIRED_NUMBER_LIST
   
inherit
   SIMPLE_INPUT_PARSER
      redefine
	 parse_last_line
      end
   
   PAIRED_NUMBER_LIST
      redefine
	 make
      end
   
creation
   make

feature {ANY}
   
   make is
      do
	 Precursor
      end
	 
   parse_last_line is
	 --read comma-separated numbers
      local
	 comma_index : INTEGER
	 x_string : STRING
	 y_string : STRING
	 new_x : INTEGER
	 new_y : INTEGER
      do
	 clear_error_flag
	 comma_index := last_line.index_of( ',' )
	 check_for_error( comma_index = last_line.count + 1, "No comma" )
	 x_string := last_line.substring( 1, comma_index - 1 )
	 y_string := last_line.substring( comma_index + 1, last_line.count )
	 check_for_error( not ( x_string.is_double or x_string.is_integer), "invalid X" )
	 check_for_error( not (y_string.is_double or y_string.is_integer), "invalid Y" )
	 if not error_flag then
	    add_entry( double_from_string( x_string ), double_from_string( y_string ) )
	 end
      end
   
   double_from_string ( string : STRING ) : DOUBLE is
      require
	 string.is_double or string.is_integer
      do
	 if string.is_double then
	    Result := string.to_double
	 elseif string.is_integer then
	    Result := string.to_integer.to_double
	 end
      end
   
   error_flag : BOOLEAN
   
   clear_error_flag is
      do
	 error_flag := false
      end
   
   check_for_error( condition : BOOLEAN; message : STRING ) is
      do
	 if condition then
	    error_flag := true
	    report_error( message )
	 end
      end
   
   report_error( message : STRING ) is
      do
	 std_error.put_string( message + ":" + last_line + "%N" )
      end
   
end

main.e

class MAIN

creation
   make
   
feature { ANY }
   
   make is
      local
	 number_list : READABLE_PAIRED_NUMBER_LIST
      do
	 !!number_list.make
	 number_list.set_input( io )
	 number_list.parse_until_eof
	 --now output results
	 if ( number_list.count > 0 ) then
	    std_output.put_string( "Beta 0: " )
	    std_output.put_double( number_list.beta_0 )
	    std_output.put_string( "%NBeta 1: " )
	    std_output.put_double( number_list.beta_1 )
	    std_output.put_string( "%N" )
	 else
	    std_output.put_string( "Not enough entries for calculations" )   
	 end --if
      end
   
end-- MAIN

Compile

Once again, I learn my lessons on syntax errors and proper use of components. After some time, and some searching through the header files, I finally found the "c_str()" function for C++ standard library strings; I had erroneously been using data(), which is not necessarily terminated. A good book probably would have shown this to me (the header files are not very helpful, and C++ templates are rather difficult to understand).

While I did have some minor problems with the Eiffel standard library, the HTML documentation generated by the short utility has been very readable and extremely useful. While some standard Eiffel library features could be more useful (for example, class STRING comes with a split feature-- but only splits on a set of defined separators, not including the comma), it's very solid and readable.

Test

Few problems here. The few which existed came from either poor use of libraries or my own error misunderstanding the linear regression algorithm.

Table 4-2. Test Results Format-- Program 4A (results essentially identical for Eiffel and C++ versions)

Test	Expected B₀	Expected B₁	Actual B₀	Actual B₁
Table D8: Estimated Class vs Actual New and Changed LOC	-22.55	1.7279	-22.5525	1.72793
Table D8: Estimated New and Changed LOC vs Actual New and Changed LOC	-23.92	1.4310	-23.9239	1.42097
Programs 2A, 3A, 4A: Estimated New and Changed LOC vs Actual New and Changed LOC			850.905	-5.32154

Incidentally, the correct beta values for my own code should have shifted my new/changed estimate from 143 LOC to a ludicrous 569 LOC. I think that lessons 2 and 3 probably warped things quite a bit, and now wish that I was not including them in my data.

Postmortem

PSP1 Project Plan Summary

Table 4-3. Project Plan Summary

Student:	Victor B. Putz	Date:	000106
Program:	Regression Beta Analyzer	Program#	4A
Instructor:	Wells	Language:	C++

Summary LOC/Hour	Plan	Actual	To date
	?	41	41

Program Size	Plan	Actual	To date
Base	28	28
Deleted	0	6
Modified	4	4
Added	139	129
Reused	106	87	87
Total New and Changed	143	129	686
Total LOC	273	236	966
Total new/reused	0	0	0

Time in Phase (min):	Plan	Actual	To Date	To Date%
Planning	7	79	116	14
Design	13	10	79	10
Code	35	41	226	27
Compile	8	9	52	6
Test	47	33	287	35
Postmortem	10	16	66	8
Total	120	188	826	100
Defects Injected		Actual	To Date	To Date %
Plan		0	0	0
Design		6	21	26
Code		9	55	68
Compile		0	3	3
Test		0	3	3
Total development		15	82	100
Defects Removed		Actual	To Date	To Date %
Planning		0	0	0
Design		0	0	0
Code		4	19	23
Compile		9	35	43
Test		2	28	34
Total development		15	82	100
After Development		0	0

Eiffel code/compile/test

Time in Phase (min)	Actual	To Date	To Date %
Code	28	115	43
Compile	7	63	24
Test	15	89	33
Total	50	267	100
Defects Injected	Actual	To Date	To Date %
Design	0	4	8
Code	7	45	90
Compile	0	0	0
Test	1	1	2
Total	8	50	100
Defects Removed	Actual	To Date	To Date %
Code	0	1	2
Compile	4	28	56
Test	4	21	42
Total	8	50	100

Time Recording Log

Table 4-4. Time Recording Log

Student:	Victor B. Putz	Date:	000106
		Program:	4A

Start	Stop	Interruption Time	Delta time	Phase	Comments
000106 17:18:21	000106 18:38:07	0	79	plan
000107 10:12:58	000107 10:23:24	0	10	design
000107 10:29:27	000107 11:10:40	0	41	code
000107 11:12:34	000107 11:21:44	0	9	compile
000107 11:21:50	000107 11:55:36	0	33	test
000107 11:56:10	000107 12:12:38	0	16	postmortem

Table 4-5. Time Recording Log

Student:	Victor Putz	Date:	000107
		Program:	4A

Start	Stop	Interruption Time	Delta time	Phase	Comments
000107 12:17:13	000107 12:44:52	0	27	code
000107 12:44:56	000107 12:51:35	0	6	compile
000107 12:51:48	000107 13:06:41	0	14	test

Defect Reporting Logs

Table 4-6. Defect Recording Log

Student:	Victor B. Putz	Date:	000106
		Program:	4A

Defect found	Type	Reason	Phase Injected	Phase Removed	Fix time	Comments
000107 10:34:11	ic	om	design	code	1	forgot to design a constructor
000107 10:58:03	ic	om	design	code	3	no line counter in new i/o section; added
000107 11:01:22	ic	om	design	code	1	Added error reporting
000107 11:04:12	ic	om	design	code	2	added error flag for more beneficial error reporting
000107 11:12:37	wn	ty	code	compile	0	mistyped "readable_paired_number_list"
000107 11:14:00	sy	ty	code	compile	0	missed semicolon at end of class declaration
000107 11:14:45	wn	ty	code	compile	0	typed a_liste instead of a_list
000107 11:15:24	sy	om	code	compile	0	forgot const on end of method implementation
000107 11:16:11	sy	om	code	compile	0	forgot to #include required file
000107 11:17:18	sy	ig	code	compile	1	could not declare two iterators in for clause; moved declarations out
000107 11:19:43	sy	ty	code	compile	0	forgot parentheses around no-argument feature
000107 11:20:14	wn	om	code	compile	0	Forgot to prepend class name to clear_error method
000107 11:21:17	sy	om	code	compile	0	forgot to declare constructor in class declaration (only added body in cpp file)
000107 11:25:49	wn	ig	design	test	5	was using data() instead of c_str() to get c-readable string from the std::string class.
000107 11:35:00	wa	ig	design	test	15	Incorrectly assumed that the summation symbol applied to the whole equation, not just the first part.

Table 4-7. Defect Recording Log

Student:	Victor Putz	Date:	000107
		Program:	4A

Defect found	Type	Reason	Phase Injected	Phase Removed	Fix time	Comments
000107 12:45:00	ic	om	code	compile	1	forgot to add count feature
000107 12:47:02	sy	om	code	compile	0	forgot to add "Redefine make" clause
000107 12:48:00	wn	ig	code	compile	0	used wipe_out instead of clear to empty linked list
000107 12:50:38	mc	ig	code	compile	0	forgot to call precursor in make
000107 12:53:05	iv	ig	code	test	0	forgot-- Eiffel strings start at 1, not 0
000107 12:54:58	wa	ig	code	test	3	forgot that to_double doesn't work with straight integers; added double_from_string feature
000107 12:58:43	wn	ty	code	test	4	accidentally typed ys instead of xs
000107 13:04:37	sy	ty	test	test	0	test data did not have a terminating newline after the last data entry

Prev	Home	Next
Lesson 4: Planning: Estimating software Size, continued	Up	Summary