Program 7a: Correlation

Using a linked list, write a program to calculate the correlation of two sets of data

Given Requirements

 

Requirements: Write a program to calculate the correlation between two series of numbers and determine the significance of this correlation. The formula for making the correlation calculation is given in section A3. Use the numerical integration function from program 5A to calculate the value of the t distribution and hold the data in a linked list.

Testing: Thoroughly test the program. As one test, use the data in Table D12. Here, the results for the correlation between x and y should be r=0.9543158, t = 9.0335, with 2*(1-p)=1.80*10-5. This is a significance of substantially better than (less than) 0.005. This example is worked out in sections A3.2 and A4.1 ... also use program 7a to analyze the data on your programming exercises to date to determine the correlation between the actual new and changed LOC and the actual development time, the correlation between the estimated new and changed LOC and the actual development time, and the significance of these correlations. Prepare and submit a test report that includes these data and uses the format in table D13.

Table 8-1. D12: Total LOC and development hours for 10 Pascal programs

Item NumberActual New and Changed LOCDevelopment Hours
nxy
118615.0
269969.9
31326.5
427222.4
529128.4
633165.9
719919.4
81890198.7
978838.8
101601138.2
Totals6389603.2

Table 8-2. Test Results Format: Program 7A

TestExpected ValueActual Value
 
rt2*(1-p)
rt2*(1-p)
Table D12
0.95439.03351.80*10-5
   
Actual LOC vs Development Time
n/an/an/a
   
Estimated LOC vs Development Time
n/an/an/a
   
 
--[Humphrey95] 

Planning

Requirements

Program 7a, like several others, analyzes two series of numbers and their relationship to each other. With that in mind, I can see no reason at all not to recycle much of program 6a (which did linear regression and prediction) to add the correlation and significance calculations, producing output as below:

Historical data read.
Beta-0: -0.351494
Beta-1: 0.0949624
Standard deviation: 19.7304
Correlation r: 0.9543
t: 9.0335
2*(1-p): 1.80e-5

Estimate at x=300
Projected y: 28.1372
t (70 percent): 1.10815
t (90 percent): 1.85955
Range (70 percent): 23.2687; UPI: 51.406; LPI: 4.86851
Range (90 percent): 39.0466; UPI: 67.1838; LPI: -10.9094

The input file format will be the same as for program 6a: pairs of comma-separated doubles in a file, optionally commented with two hyphens ("--") to begin an inline comment. The list of numbers will be terminated by the lowercase word "stop"; any single numbers on lines below that will constitute predictions and will generate predictions as shown above based on the linear regression parameters.

Size estimate

Using program 6a and historical data, the estimated size of new and changed code is 179

Time estimate

Using data for the last three programs (PROBE estimates of new/changed LOC vs actual development time), estimated total development time is 228 minutes.

Development

Design

Using Dia, a freeware diagramming tool, I've made a simplified diagram of the new features for appropriate classes, which should give a good idea of what's changing, supplemented by brief notes. Not much changes from program 6a, but that's to be expected from the requirements

Preliminary design diagram, using a form of UML

Preliminary design; some things are missing, and are caught later in the design review. I prefer to keep designs fairly sparse and implementation-independent.

Design Review

First design review-- and it did catch some things! I get the feeling from the design review checklist provided by Humphrey that I'm meant to do a full design, including pseudocode for all methods (with checks in the review checklist like "loops are properly initiated, incremented, and terminated"). I think many of these details are specific to the implementation language and should probably be left to coding, but that's my personal opinion of the design process (in fact, I'm even against the parentheses that the UML template put on my features when diagramming-- that's language-dependent! I may try and switch to BON, if I can figure out how to convince Dia to do different diagram styles).

I did find a few errors, mostly due to missing contracts (which are not really part of the implementation, but I do code them in both C++ and Eiffel). The only real algorithmic problem was the omission of the square root in the correlation calculation-- a significant omission!

Code

No surprises, although some of the number_list additions (tail, etc) required additional code not included in the design.

Reused code

The simple_input_parser, simpson_integrator, paired_number_list, and single_variable_function classes were reused in full, as were the gamma_function, error_log, is_double_equal, t_distribution, t_distribution_base, t_integral, and whitespace_stripper modules.

number_list

number_list added the head and tail methods, as well as the mapped_to, multiplied_by_list, and append features.

/*
*/

#ifndef NUMBER_LIST_H
#define NUMBER_LIST_H

#ifndef SINGLE_VARIABLE_FUNCTION_H
#include "single_variable_function.h"
#endif

#include <list>
#include <iostream>

//a class which encapsulates a list of double values, adding the features of
//mean and standard deviation
class number_list:public list < double >
{
  public:double sum (void) const;
  double mean (void) const;
  double standard_deviation (void) const;
  int entry_count (void) const;
  void add_entry (double new_entry);

  double head( void ) const;
  number_list tail( void ) const;
  number_list mapped_to( const single_variable_function& f ) const;
  number_list multiplied_by_list( const number_list& rhs ) const;
  void append( const number_list& rhs );

};

#endif

/*
*/
/*
*/

#include "number_list.h"
#include <assert.h>
#include <stdlib.h>
#include <math.h>

#ifndef CONTRACT_H
#include "contract.h"
#endif

double
number_list::sum (void) const
{
  double
    Result = 0;
  for (list < double >::const_iterator iter = begin ();
       iter != end (); ++iter)
    {
      Result += *iter;
    }
  return Result;
}

double
number_list::mean (void) const
{
  assert (entry_count () > 0);
  return sum () / entry_count ();
}

double
number_list::standard_deviation (void) const
{
  assert (entry_count () > 1);
  double
    sum_of_square_differences = 0;
  for (list < double >::const_iterator iter = begin ();
       iter != end (); ++iter)
    {
      const double
	this_square_difference = *iter - mean ();
      sum_of_square_differences +=
	this_square_difference * this_square_difference;
    }
  return sqrt (sum_of_square_differences / (entry_count () - 1));
}

int
number_list::entry_count (void) const
{
  return size ();
}

void
number_list::add_entry (double new_entry)
{
  push_back (new_entry);
}

double
number_list::head( void ) const
{
  REQUIRE( entry_count() > 0 );
  return *begin();
}

number_list
number_list::tail( void ) const
{
  number_list Result;
  list < double >::const_iterator iter = begin();
  ++iter;
  Result.insert( Result.begin(), iter, end() );
  return Result;
}

number_list
number_list::mapped_to( const single_variable_function& f ) const
{
  number_list Result;
  for (list < double >::const_iterator iter = begin ();
       iter != end (); ++iter)
    {
      Result.add_entry( f.at(*iter) );
    }
  return Result;
}

number_list
number_list::multiplied_by_list( const number_list& rhs ) const
{
  REQUIRE( entry_count() == rhs.entry_count() );
  number_list Result;
  if ( entry_count() > 0 )
    {
      Result.add_entry( head() * rhs.head() );
      Result.append( tail().multiplied_by_list( rhs.tail() ) );
    }      
  return Result;
}

void
number_list::append( const number_list& rhs )
{
  insert( end(), rhs.begin(), rhs.end() );
}

/*
*/

paired_number_list_predictor

#ifndef PAIRED_NUMBER_LIST_PREDICTOR_H
#define PAIRED_NUMBER_LIST_PREDICTOR_H

#ifndef PAIRED_NUMBER_LIST_H
#include "paired_number_list.h"
#endif
#ifndef T_DISTRIBUTIONL_H
#include "t_distribution.h"
#endif

class paired_number_list_predictor:public paired_number_list
{
  public:double variance (void) const;
  double standard_deviation (void) const;
  double projected_y (double x) const;
  double prediction_range (double x, double range) const;
  double lower_prediction_interval (double x, double range) const;
  double upper_prediction_interval (double x, double range) const;
  double t (double range) const;

  static double correlation_bottom_term( const number_list& numbers );
  double correlation_top( void ) const;
  double correlation( void ) const;
  double significance_t( void ) const;
  double significance( void ) const;

    protected:t_distribution m_t_distribution;
  double prediction_range_base (void) const;
};

#endif
/*
*/

#include "paired_number_list_predictor.h"

#ifndef CONTRACT_H
#include "contract.h"
#endif
#ifndef SQUARE_H
#include "square.h"
#endif
#ifndef T_INTEGRAL_H
#include "t_integral.h"
#endif

#include <math.h>

double
paired_number_list_predictor::variance (void) const
{
  REQUIRE (entry_count () > 2);
  double
    Result = 0;
  list < double >::const_iterator x_iter;
  list < double >::const_iterator y_iter;

  for (x_iter = m_xs.begin (), y_iter = m_ys.begin ();
       (x_iter != m_xs.end ()) && (y_iter != m_ys.end ()); ++x_iter, ++y_iter)
    {
      Result += pow (*y_iter - beta_0 () - beta_1 () * (*x_iter), 2);
    }
  Result *= 1.0 / (entry_count () - 2.0);
  return Result;
}

double
paired_number_list_predictor::standard_deviation (void) const
{
  return sqrt (variance ());
}

double
paired_number_list_predictor::projected_y (double x) const
{
  return beta_0 () + beta_1 () * x;
}

double
paired_number_list_predictor::t (double range) const
{
  const_cast <
    paired_number_list_predictor *
    >(this)->m_t_distribution.set_n (entry_count () - 2);
  return m_t_distribution.at (range);
}

double
paired_number_list_predictor::prediction_range (double x, double range) const
{
  REQUIRE (entry_count () > 0);
  const double
    a_t = t (range);
  const double
    dev = standard_deviation ();
  const double
    x_m = x_mean ();
  double
    ecount_inv = 1 / entry_count ();
  return t (range) * standard_deviation ()
    * sqrt (1.0 + 1.0 / static_cast < double >(entry_count ())
	    + pow (x - x_mean (), 2) / prediction_range_base ());
}

double
paired_number_list_predictor::lower_prediction_interval (double x,
							 double range) const
{
  return projected_y (x) - prediction_range (x, range);
}

double
paired_number_list_predictor::upper_prediction_interval (double x,
							 double range) const
{
  return projected_y (x) + prediction_range (x, range);
}

double
paired_number_list_predictor::prediction_range_base (void) const
{
  double
    Result = 0;
  for (std::list < double >::const_iterator x_iter = m_xs.begin ();
       x_iter != m_xs.end (); ++x_iter)
    {
      Result += pow ((*x_iter) - x_mean (), 2);
    }
  return Result;
}

double
paired_number_list_predictor::correlation_bottom_term( const number_list& numbers )
{
  REQUIRE( numbers.entry_count() > 0 );
  REQUIRE( numbers.sum() != 0 );
  square a_square;
  double Result = numbers.entry_count() * ( numbers.mapped_to( a_square ).sum() )
    - a_square.at( numbers.sum() );
  return Result;
}

double
paired_number_list_predictor::correlation_top( void ) const
{
  double Result = entry_count() * m_xs.multiplied_by_list( m_ys ).sum() - (x_sum() * y_sum());
  return Result;
}

double
paired_number_list_predictor::correlation( void ) const
{
  REQUIRE( correlation_bottom_term( m_xs ) != 0 );
  REQUIRE( correlation_bottom_term( m_ys ) != 0 );
  double Result = correlation_top() 
    / sqrt( correlation_bottom_term( m_xs ) * correlation_bottom_term( m_ys ) );
  return Result;
}

double
paired_number_list_predictor::significance_t( void ) const
{
  REQUIRE( correlation() != 1.0 );
  REQUIRE( entry_count() >= 2 );
  double Result = ( fabs( correlation() ) * sqrt( entry_count() - 2.0 ) )
    / sqrt( 1 - pow( correlation(), 2 ) );
  return Result;
}

double
paired_number_list_predictor::significance( void ) const
{
  t_integral t;
  t.set_n( entry_count() - 2 );
  const double p = t.at( significance_t() );
  double Result = 2.0 * ( 1.0 - p );
  return Result;
}

/*
*/

predictor_parser

#ifndef PREDICTOR_PARSER_H
#define PREDICTOR_PARSER_H

#ifndef SIMPLE_INPUT_PARSER_H
#include "simple_input_parser.h"
#endif
#ifndef PAIRED_NUMBER_LIST_PREDICTOR_H
#include "paired_number_list_predictor.h"
#endif

class predictor_parser:public simple_input_parser
{
  public:virtual void reset (void);
  virtual std::string transformed_line (const std::string & line) const;
  virtual void parse_last_line (void);
    predictor_parser (void);

    protected:bool found_end_of_historical_data;
  paired_number_list_predictor number_list;

  void parse_last_line_as_historical_data (void);
  void parse_last_line_as_end_of_historical_data (void);
  void parse_last_line_as_prediction (void);
  bool last_line_is_blank (void);
  static const std::string & historical_data_terminator;
  static const std::string & inline_comment_begin;
  bool is_double (const std::string & str);
  double double_from_string (const std::string & str);

    std::string string_stripped_of_whitespace (const std::string & str) const;
    std::string string_stripped_of_comments (const std::string & str) const;
};

#endif
/*
*/

#include "predictor_parser.h"
#ifndef WHITESPACE_STRIPPER_H
#include "whitespace_stripper.h"
#endif
#ifndef ERROR_LOG_H
#include "error_log.h"
#endif
#ifndef CONTRACT_H
#include "contract.h"
#endif

void
predictor_parser::reset (void)
{
  simple_input_parser::reset ();
  found_end_of_historical_data = false;
  number_list.reset ();
}

std::string predictor_parser::transformed_line (const std::string & str) const
{
  return whitespace_stripper::string_stripped_of_whitespace (string_stripped_of_comments (str));
}

std::string
  predictor_parser::string_stripped_of_comments (const std::string & str) const
{
  std::string::size_type comment_index = str.find (inline_comment_begin);
  return str.substr (0, comment_index);
}

void
predictor_parser::parse_last_line (void)
{
  if (last_line_is_blank ())
    {
      return;
    }
  else if (last_line () == historical_data_terminator)
    {
      parse_last_line_as_end_of_historical_data ();
    }
  else
    {
      if (!found_end_of_historical_data)
	{
	  parse_last_line_as_historical_data ();
	}
      else
	{
	  parse_last_line_as_prediction ();
	}
    }
}

bool predictor_parser::last_line_is_blank (void)
{
  if (last_line ().length () == 0)
    {
      return true;
    }
  else
    {
      return false;
    }
}

void
predictor_parser::parse_last_line_as_historical_data (void)
{
  //6 reused, 6 modified?
  error_log
    errlog;
  //split the string around the comma
  const
    std::string::size_type comma_index = last_line ().find (',');
  errlog.check_error (comma_index == last_line ().npos, "No comma");
  std::string x_string = last_line ().substr (0, comma_index);
  std::string y_string =
    last_line ().substr (comma_index + 1, last_line ().length ());
  //get values for each double and ensure they're valid
  errlog.check_error (!is_double (x_string), "X invalid:" + x_string);
  errlog.check_error (!is_double (y_string), "Y invalid:" + y_string);
  if (!errlog.error_flag ())
    {
      double
	new_x = double_from_string (x_string);
      double
	new_y = double_from_string (y_string);
      //add the entry
      cout << "added: " << new_x << ", " << new_y << "\n";
      number_list.add_entry (new_x, new_y);
    }
}

void
predictor_parser::parse_last_line_as_end_of_historical_data (void)
{
  REQUIRE (last_line () == historical_data_terminator);
  cout << "Historical data read.\n"
    << "Beta-0: " << number_list.beta_0 () << "\n"
    << "Beta-1: " << number_list.beta_1 () << "\n"
    << "Standard deviation: " << number_list.standard_deviation () << "\n";
  if ( ( number_list.entry_count() >= 2 ) 
       && ( number_list.x_sum() != 0 )
       && ( number_list.y_sum() != 0 ) )
    {
      cout << "Correlation: " << number_list.correlation() << "\n"
	   << "Significance t: " << number_list.significance_t() << "\n"
	   << "2*(1-p): " << number_list.significance() << "\n\n";
    }
  else 
    {
      cout << "Too few numbers for correlation calc, or sums do not permit correlation calc\n\n";
    }
  found_end_of_historical_data = true;
}

predictor_parser::predictor_parser (void)
{
  reset ();
}

void
predictor_parser::parse_last_line_as_prediction (void)
{
  error_log
    errlog;
  errlog.check_error (!is_double (last_line ()),
		      "Not a double: " + last_line ());
  if (!errlog.error_flag ())
    {
      const double
	x = double_from_string (last_line ());
      cout << "Estimate at x=" << x << "\n"
	<< "Projected y: " << number_list.projected_y (x) << "\n"
	<< "t (70 percent): " << number_list.t (0.7) << "\n"
	<< "t (90 percent): " << number_list.t (0.9) << "\n"
	<< "Range (70 percent): " << number_list.prediction_range (x, 0.7)
	<< "; UPI: " << number_list.upper_prediction_interval (x, 0.7)
	<< "; LPI: " << number_list.lower_prediction_interval (x, 0.7)
	<< "\nRange (90 percent): " << number_list.prediction_range (x, 0.9)
	<< "; UPI: " << number_list.upper_prediction_interval (x, 0.9)
	<< "; LPI: " << number_list.lower_prediction_interval (x, 0.9)
	<< "\n";
    }
}

bool predictor_parser::is_double (const std::string & str)
{
  bool
    Result = true;
  char *
    conversion_end = NULL;
  strtod (str.c_str (), &conversion_end);
  if (conversion_end == str.data ())
    {
      Result = false;
    }
  return Result;
}


double
predictor_parser::double_from_string (const std::string & str)
{
  REQUIRE (is_double (str));
  return strtod (str.c_str (), NULL);
}


const
  std::string & predictor_parser::historical_data_terminator = "stop";

const
  std::string & predictor_parser::inline_comment_begin = "--";

/*
*/

square

#ifndef SQUARE_H
#define SQUARE_H

#ifndef SINGLE_VARIABLE_FUNCTION_H
#include "single_variable_function.h"
#endif

class square : public single_variable_function
//returns the square of the argument
{
 public:
  virtual double at( double x ) const;
};

#endif
/*
*/

#include "square.h"

#include <math.h>

double
square::at( double x ) const
{
  return pow( x, 2 );
}


/*
*/

main

/*
*/

#include <fstream>
#include <iostream>
#include "string.h"

#ifndef PREDICTOR_PARSER_H
#include "predictor_parser.h"
#endif

istream *
input_stream_from_args (int arg_count, const char **arg_vector)
{
  istream *Result = NULL;
  if (arg_count == 1)
    {
      Result = &cin;
    }
  else
    {
      const char *help_text =
	"PSP exercise 6A: Calculate a prediction and interval given historical data.\nUsage:\n\tpsp_5a\n\n";
      cout << help_text;
    }
  return Result;
}

int
main (int arg_count, const char **arg_vector)
{
  //get the input stream, or print the help text as appropriate
  istream *input_stream = input_stream_from_args (arg_count, arg_vector);
  if (input_stream != NULL)
    {
      predictor_parser parser;
      parser.set_input_stream (input_stream);
      parser.parse_until_eof ();
    }
}

/*
*/

paired_number_list_predictor.e

 class PAIRED_NUMBER_LIST_PREDICTOR
 --reads a set of paired numbers, does linear regression, predicts results

 inherit 
    PAIRED_NUMBER_LIST
       redefine make
       end; 

 creation {ANY} 
    make

 feature {ANY} 

    variance: DOUBLE is 
       local 
	  i: INTEGER;
       do  
	  Result := 0;
	  from 
	     i := xs.lower;
	  until 
	     not (xs.valid_index(i) and ys.valid_index(i))
	  loop 
	     Result := Result + (ys.item(i) - beta_0 - beta_1 * xs.item(i)) ^ 2;
	     i := i + 1;
	  end; 
	  Result := Result / (entry_count - 2);
       end -- variance

    standard_deviation: DOUBLE is 
       do  
	  Result := variance.sqrt;
       end -- standard_deviation

    projected_y(x: DOUBLE): DOUBLE is 
       --projected value of given x, using linear regression 
       --parameters from xs and ys
       do  
	  Result := beta_0 + beta_1 * x;
       end -- projected_y

    prediction_range_base: DOUBLE is 
       --base of the prediction range, used in prediction_range
       local 
	  i: INTEGER;
       do  
	  Result := 0;
	  from 
	     i := xs.lower;
	  until 
	     not (xs.valid_index(i) and ys.valid_index(i))
	  loop 
	     Result := Result + (xs.item(i) - xs.mean) ^ 2;
	     i := i + 1;
	  end; 
       end -- prediction_range_base

    prediction_range(x, range: DOUBLE): DOUBLE is 
       --prediction range, based on given estimate and % range
       require 
	  entry_count > 0; 
       do  
	  Result := (1.0 + (1.0 / entry_count.to_double) + (((x - xs.mean) ^ 2) / prediction_range_base)).sqrt;
	  Result := t(range) * standard_deviation * Result;
       end -- prediction_range

    lower_prediction_interval(x, range: DOUBLE): DOUBLE is 
       --LPI, from [Humphrey95]
       do  
	  Result := projected_y(x) - prediction_range(x,range);
       end -- lower_prediction_interval

    upper_prediction_interval(x, range: DOUBLE): DOUBLE is 
       --UPI, from [Humphrey95]
       do  
	  Result := projected_y(x) + prediction_range(x,range);
       end -- upper_prediction_interval

    t_distribution: T_DISTRIBUTION;

    make is 
       do  
	  Precursor;
	  !!t_distribution.make;
       end -- make

    t(range: DOUBLE): DOUBLE is 
       --gets the size of the t-distribution at the given alpha range
       do  
	  t_distribution.set_n(entry_count - 2);
	  Result := t_distribution.at(range);
       end -- t

    correlation_bottom_term( numbers: NUMBER_LIST ): DOUBLE is
	  -- bottom term of the correlation
       require
	  entry_count > 0
	  numbers.sum /= 0
       local
	  square : SQUARE
      do
	 !!square
	  Result := ( entry_count * ( numbers.mapped_to( square ).sum ) ) - square.at( numbers.sum )
       end

    correlation_top : DOUBLE is
	  --top term of the correlation equation
       do
	  Result := entry_count * xs.multiplied_by_list( ys ).sum - ( xs.sum * ys.sum )
       end

    correlation : DOUBLE is
	  --correlation (r, not rsquared)
       require
	  correlation_bottom_term( xs ) /= 0
	  correlation_bottom_term( ys ) /= 0
       do
	  Result := correlation_top / ( correlation_bottom_term( ys ) * correlation_bottom_term( xs ) ).sqrt
       end

    significance_t : DOUBLE is
	  --t-portion of the significance (see [Humphrey95])
       require
	  correlation /= 1
	  entry_count >= 2
       do
	  Result := ( ( correlation.abs * ( entry_count - 2 ).sqrt ) / ( 1 - ( correlation ^ 2 ) ).sqrt )
       end

    significance : DOUBLE is
	  --2( 1 - p ); significance of the correlation
       local
	  a_t : T_INTEGRAL
	  p : DOUBLE
       do
	  !!a_t.make
	  a_t.set_n( entry_count - 2 )
	  p := a_t.at( significance_t )
	  Result := 2 * ( 1 - p )
      end
      
end -- class PAIRED_NUMBER_LIST_PREDICTOR

predictor_parser.e

class PREDICTOR_PARSER
--reads a list of number pairs, and performs linear regression analysis

inherit 
   SIMPLE_INPUT_PARSER
      redefine parse_last_line, transformed_line
      end; 
   
creation {ANY} 
   make

feature {ANY} 
   
   inline_comment_begin: STRING is "--";
   
   string_stripped_of_comment(string: STRING): STRING is 
      --strip the string of any comment
      local 
         comment_index: INTEGER;
      do  
         if string.has_string(inline_comment_begin) then 
            comment_index := string.index_of_string(inline_comment_begin);
            if comment_index = 1 then 
               Result := "";
            else 
               Result := string.substring(1,comment_index - 1);
            end; 
         else 
            Result := string;
         end; 
      end -- string_stripped_of_comment
   
   string_stripped_of_whitespace(string: STRING): STRING is 
      --strip string of whitespace
      do  
         Result := string;
         Result.left_adjust;
         Result.right_adjust;
      end -- string_stripped_of_whitespace
   
   transformed_line(string: STRING): STRING is 
      --strip comments and whitespace from parseable line      
      do  
         Result := string_stripped_of_whitespace(string_stripped_of_comment(string));
      end -- transformed_line
   
   number_list: PAIRED_NUMBER_LIST_PREDICTOR;

feature {ANY} --parsing

   found_end_of_historical_data: BOOLEAN;
   
   reset is 
      --resets the parser and makes it ready to go again
      do  
         found_end_of_historical_data := false;
         number_list.reset;
      end -- reset
   
   make is 
      do  
         !!number_list.make;
         reset;
      end -- make
   
   parse_last_line_as_historical_data is 
      --interpret last_line as a pair of comma-separated values
      local 
         error_log: ERROR_LOG;
         comma_index: INTEGER;
         x_string: STRING;
         y_string: STRING;
         new_x: DOUBLE;
         new_y: DOUBLE;
      do  
         !!error_log.make;
         comma_index := last_line.index_of(',');
         error_log.check_for_error(comma_index = last_line.count + 1,"No comma:" + last_line);
         x_string := last_line.substring(1,comma_index - 1);
         y_string := last_line.substring(comma_index + 1,last_line.count);
         error_log.check_for_error(not (x_string.is_double or x_string.is_integer),"invalid X:" + last_line);
         error_log.check_for_error(not (y_string.is_double or y_string.is_integer),"invalid Y:" + last_line);
         if not error_log.error_flag then 
            new_x := double_from_string(x_string);
            new_y := double_from_string(y_string);
            number_list.add_entry(new_x,new_y);
            std_output.put_string("added: ");
            std_output.put_double(new_x);
            std_output.put_string(", ");
            std_output.put_double(new_y);
            std_output.put_new_line;
         end; 
      end -- parse_last_line_as_historical_data
   
   double_from_string(string: STRING): DOUBLE is 
      require 
         string.is_double or string.is_integer; 
      do  
         if string.is_double then 
            Result := string.to_double;
         elseif string.is_integer then 
            Result := string.to_integer.to_double;
         end; 
      end -- double_from_string
   
   historical_data_terminator: STRING is "stop";
   
   parse_last_line_as_end_of_historical_data is 
      --interpret last line as the end of historical data
      require 
         last_line.compare(historical_data_terminator) = 0; 
      do  
         found_end_of_historical_data := true;
         std_output.put_string("Historical data read.%NBeta-0: ");
         std_output.put_double(number_list.beta_0);
         std_output.put_string("%NBeta-1: ");
         std_output.put_double(number_list.beta_1);
         std_output.put_string("%NStandard Deviation: ");
         std_output.put_double(number_list.standard_deviation);
	 std_output.put_string("%NCorrelation: ");
	 std_output.put_double(number_list.correlation);
	 std_output.put_string("%NSignificance t: ");
	 std_output.put_double(number_list.significance_t);
	 std_output.put_string("%N2*(1-p):");
	 std_output.put_double(number_list.significance);
         std_output.put_string("%N%N");
      end -- parse_last_line_as_end_of_historical_data
   
   parse_last_line_as_prediction is 
      --interpret last line as a single x, for a predictive y
      local 
         error_log: ERROR_LOG;
         x: DOUBLE;
      do  
         !!error_log.make;
         error_log.check_for_error(not (last_line.is_double or last_line.is_integer),"Not a double : " + last_line);
         if not error_log.error_flag then 
            x := double_from_string(last_line);
            std_output.put_string("Estimate at x=");
            std_output.put_double(x);
            std_output.put_string("%NProjected y: ");
            std_output.put_double(number_list.projected_y(x));
            std_output.put_string("%Nt (70 percent): ");
            std_output.put_double(number_list.t(0.7));
            std_output.put_string("%Nt (90 percent): ");
            std_output.put_double(number_list.t(0.9));
            std_output.put_string("%NRange (70 percent): ");
            std_output.put_double(number_list.prediction_range(x,0.7));
            std_output.put_string("; UPI: ");
            std_output.put_double(number_list.upper_prediction_interval(x,0.7));
            std_output.put_string("; LPI: ");
            std_output.put_double(number_list.lower_prediction_interval(x,0.7));
            std_output.put_string("%NRange (90 percent): ");
            std_output.put_double(number_list.prediction_range(x,0.9));
            std_output.put_string("; UPI: ");
            std_output.put_double(number_list.upper_prediction_interval(x,0.9));
            std_output.put_string("; LPI: ");
            std_output.put_double(number_list.lower_prediction_interval(x,0.9));
            std_output.put_new_line;
         end; 
      end -- parse_last_line_as_prediction
   
   parse_last_line is 
      --parse the last line according to state
      do  
         if not last_line.empty then 
            if last_line.compare(historical_data_terminator) = 0 then 
               parse_last_line_as_end_of_historical_data;
            else 
               if found_end_of_historical_data then 
                  parse_last_line_as_prediction;
               else 
                  parse_last_line_as_historical_data;
               end; 
            end; 
         end; 
      end -- parse_last_line

end -- class PREDICTOR_PARSER

square.e

class SQUARE
   
   inherit
      SINGLE_VARIABLE_FUNCTION
      redefine
	 at
	 
feature {ANY}
   
   at( x: DOUBLE ) : DOUBLE is
      do
	 Result := x ^ 2;
      end

end
	 

main.e

class MAIN

creation {ANY} 
   make

feature {ANY} 
   
   make is 
      local 
         parser: PREDICTOR_PARSER;
         gamma: GAMMA_FUNCTION;
      do  
         !!parser.make;
         parser.set_input(io);
         parser.parse_until_eof;
      end -- make

end -- MAIN

Code Review

Mostly minors caught; forgot to return values occasionally, and had a header/implementation parity problem, but nothing significant.

Compile

Annoyingly enough, my design review missed some obvious fumbles-- missing #includes, wrong return types, etc. I'll be curious to see how the postmortem turns out in terms of yield, because I don't feel like this was terribly effective, but we'll see how it goes. In any case, it's my first attempt at reviews, and they did pick up several defects before compile, so I feel somewhat better.

Test

Perhaps inspections have some benefits after all! Only one error in test: a plus sign should have been a minus. And that's all. Not bad!

Table 8-3. Test Results Format: Program 7A

TestExpected ValueActual Value -- C++Actual Vaule -- Eiffel
 
rt2*(1-p)
rt2*(1-p)
rt2*(1-p)
Table D12
0.95439.03351.80*10-5
0.9543169.033511.80318*10-5
0.9543169.0335100.000018
Actual LOC vs Development Time
n/an/an/a
0.8906823.393340.0426699
0.8906823.3933380.42670
Estimated LOC vs Development Time
n/an/an/a
0.9766464.545590.137856
0.9766464.5455930.137856

Among other things, this does show that my development time is indeed related to my estimates. According to [Humphrey95] p. 513, an r2 value of about 0.8 (as is the case with actual LOC vs time) indicates "a strong correlation. The relationship is adequate for planning purposes." An r2 value of 0.95 (as for estimated LOC to actual development time) indicates a relationship which is "predictive and you can use it with high confidence."

This sort of news sounds peculiar (there's a closer relationship with my predicted LOC and time than there is with actual LOC and time?), but the difference in significances is telling (there's only about a 4% chance that the actual-LOC-to-time correlation would happen by chance, but around a 14% chance that the estimated-LOC-to-time correlation would happen by chance).

Postmortem

If I believe the numbers, the design and code reviews were indeed helpful, the design review being 2.5 times as effective as test, and the code review 3.3 times as effective, in terms of defects removed per hour spent in the activity. Of course, some judgement should be used, as the compilation step was 8.5 times as effective-- but the errors found were automatically caught, and not particularly significant.

PSP2 Project Plan Summary

Table 8-4. Project Plan Summary

Student:Victor B. PutzDate:000121
Program:CorrelationProgram#7A
Instructor:WellsLanguage:C++
SummaryPlanActualTo date
Loc/Hour 492646
Planned time228 484
Actual time 198513
CPI (cost/performance index)  0.94
%reused778741
Test Defects/KLOC?1134
Total Defects/KLOC?232.55137
Yield (defects before test/total defects)?9575
Program SizePlanActualTo date
Base227227 
Deleted00 
Modified00 
Added6886 
Reused365365988
Total New and Changed179861175
Total LOC7716782390
Total new/reused000
Upper Prediction Interval (70%)281  
Lower Prediction Interval (70%)77  
Time in Phase (min):PlanActualTo DateTo Date%
Planning415629720
Design233816811
Design Review?16161
Code624039126
Code Review?12121
Compile167966
Test701043028
Postmortem16191117
Total2281981521100
Total Time UPI (70%)280   
Total Time LPI (70%)176   
Defects Injected ActualTo DateTo Date %
Plan 000
Design795132
Design Review?000
Code151110465
Code Review?000
Compile1032
Test1032
Total development2320161100
Defects Removed ActualTo DateTo Date %
Planning 000
Design 000
Design Review?443
Code553622
Code Review?443
Compile1267748
Test614023
Total development2320161100
After Development000 
Defect Removal EfficiencyPlanActualTo Date 
Defects/Hour - Design Review?1515 
Defects/Hour - Code Review?2020 
Defects/Hour - Compile?51.451.4 
Defects/Hour - Test?66 
DRL (design review/test)?2.52.5 
DRL (code review/test)?3.33.3 
DRL (compile/test)?8.58.5 
Eiffel code/compile/test
Time in Phase (min)ActualTo DateTo Date %
Code2524250
Code Review10102
Compile611223
Test312025
Total44484100
Defects InjectedActualTo DateTo Date %
Design044
Code119795
Compile000
Test011
Total11102100
Defects RemovedActualTo DateTo Date %
Code011
Code Review555
Compile56665
Test13029
Total11102100
Defect Removal EfficiencyActualTo Date 
Defects/Hour - Code Review3030 
Defects/Hour - Compile5050 
Defects/Hour - Test2020 
DRL (code review/test)1.52.5 
DRL (compile/test)2.52.5 

Time Recording Log

Table 8-5. Time Recording Log

Student:Victor PutzDate:000121
  Program:7a
StartStopInterruption TimeDelta timePhaseComments
000121 08:55:21000121 10:05:581456plan 
000121 10:26:13000121 11:04:10037design 
000121 11:13:47000121 11:29:20015design review 
000121 11:33:08000121 12:13:44040code 
000121 12:14:56000121 12:26:44011code review 
000121 12:29:30000121 12:36:3807compile 
000121 12:38:01000121 12:48:0009test 
000121 12:58:53000121 13:18:04019postmortem 
      

Table 8-6. Time Recording Log

Student: Date:000123
  Program: 
StartStopInterruption TimeDelta timePhaseComments
000123 12:55:47000123 13:20:32024code 
000123 13:20:33000123 13:30:47010code review 
000123 13:30:55000123 13:36:3105compile 
000123 13:36:57000123 13:39:2902test 
      

Defect Reporting Logs

Table 8-7. Defect Recording Log

Student:Victor PutzDate:000121
  Program:7a
Defect foundTypeReasonPhase InjectedPhase RemovedFix timeComments
000121 11:14:48ctigdesigndesign review0Missing contract for correlation_bottom_term; requires n > 0, numbers.sum != 0
000121 11:16:30ctigdesigndesign review1Missed requirement for bottom terms to not be equal to zero
000121 11:18:00mcomdesigndesign review1missed square root call in correlation
000121 11:20:00ctigdesigndesign review1missed requirement for correlation != 1 in significance_t, and entry_count >= 2
000121 11:47:21wtomdesigncode0correlation_bottom_term should be static
000121 11:52:05weomdesigncode0was setting n to entry_count - 1; should be entry_count - 2
000121 11:57:45ctomdesigncode3forgot contract in head()
000121 12:01:20mdomdesigncode2forgot copy operator (useful in some algs)
000121 12:12:00mdomdesigncode1 
000121 12:15:31mcomcodecode review0forgot to return the head value in head()!
000121 12:16:54waomcodecode review0strange loop logic in mapped_to
000121 12:18:08maomcodecode review0forgot to return Result in mapped_to, multiplied_by_list
000121 12:26:00syomcodecode review0forgot to declare at as const
000121 12:30:32syomcodecompile0Darn, it, header/implementation parity! correlation_bottom_term static/const parity troubles
000121 12:32:13wtcmcodecompile0wrong argument type in correlation_bottom_term
000121 12:33:57syomcodecompile0Gr... forgot an #include for T_INTEGRAL
000121 12:34:48syomcodecompile0forgot parentheses on no-parameter feature call
000121 12:35:34syomcodecompile0forgot to #include contract.h
000121 12:36:07wtcmcodecompile0wrong return type (missed &) on multiplied_by_list
000121 12:39:28wacmcodetest6+ sign should have been - in correlation_bottom_term
       

Table 8-8. Defect Recording Log

Student: Date:000123
  Program: 
Defect foundTypeReasonPhase InjectedPhase RemovedFix timeComments
000123 13:24:40macmcodecode review0Forgot to increment loop counter
000123 13:24:58ctomcodecode review0forgot contracts
000123 13:25:21ctomcodecode review0forgot correlation contract
000123 13:26:06ctomcodecode review0forgot significance contract
000123 13:28:10macmcodecode review0forgot to initialize t in significance
000123 13:31:12sycmcodecompile0used = instead of := for assignment
000123 13:31:45sycmcodecompile0used := instead of = for comparison
000123 13:32:30wnomcodecompile0name clash between local and feature
000123 13:33:33syigcodecompile1had to change visibility of xs, ys in paired_number_list
000123 13:35:41maomcodecompile0forgot to initialize square in correlation_bottom_term
000123 13:37:12wnigcodetest1problems with append-- using features from Current instead of rhs for indices, not handling empty rhs.