Program 6A: Predictions

Enhance program 4A to calculate the linear regression parameters and the prediction interval

Given Requirements

 

Requirements: Write a program to calculate an LOC estimate and the 90 percent and 70 percent prediction intervals for this estimate... Use program 5A to calculate the value of the t distribution and use a linked list for the data. You may enhance program 4A to develop this program. Note that to calculate the value of t, you integrate from 0 to a trial value of t. You find the correct value by successively adjusting the trial value of t up or down until the p value is within an acceptable error of 0.85 (for a 70 percent prediction interval) or 0.95 (for a 90 percent prediction interval).

Testing: Thoroughly test the program. As one test, use the data for estimated object LOC and actual new and changed LOC in table D8 and the beta-0 and beta-1 values found from testing program 4A. Also assume an estimated object LOC value of 386. Under these conditions, the estimated LOC values and the parameter values obtained should be as follows:

Table 6-2. Test Results Format -- Program 6a:

TestParameterExpected ValueActual Value
Table D8Beta0-22.55 
 Beta11.7279 
 UPI( 70% )874414 
 LPI( 70% )414 
 UPI( 90% )1030 
 LPI( 90% )258 
Program 6a, using historical data for programs 2a through 6aEstimated new and changed LOCN/A 
 UPI( 70% )N/A 
 LPI( 70% )N/A 
 UPI( 90% )N/A 
 LPI( 90% )N/A 
 Actual new/changed LOCN/A 
 
--[Humphrey95] 

Planning

Requirements

Program 6a lends itself to a bit more requirements-interpretation than some others. I'll clarify these as much as possible.

First, input. I'll keep my current trend of reading from standard input; it will allow me to reuse a great deal of code (particularly my hard-working if extremely simplistic simple_input_parser). In this case, though, the program will read pairs of numbers from standard input; a line with the string "stop" will represent the end of input. Because it would be convenient to include comments with the text files (thus allowing me to preserve a sort of "database" of historical data, annotated with meaning), two dashes (--) will represent the beginning of an inline comment. Blank lines will be ignored. These pairs represent the historical data, and program 6A is free to calculate several parameters from there. Once the historical data is read, the next numbers read will be the new "x" numbers, from which we are to determine the "y" numbers. Reading the "x" numbers will continue until an EOF is encountered. The data which can be output for the historical data (beta values, etc) will be output at the end of the historical data; others (such as prediction intervals, etc) will be output after reading the "x" values. An example output after reading the data from table D8 in [Humphrey95] would be:

Historical data read:
Beta-0: -22.95
Beta-1: 1.7279
Standard deviation: 197.8956

Estimate at x= 386
Projected y : 644.429
t( 70 percent ): 1.108
t( 90 percent ): 1.860
Range( 70 percent): 229.9715; UPI: 874.401; LPI: 414.4579
Range( 90 percent): 386.0533; UPI: 1030.483; LPI: 258.3761

By showing the mid-calculation values for t, range, etc, I will hopefully simplify testing while keeping the program versatile and easy to use.

Size Estimate

Once again, my historical estimated-LOC-to-actual-LOC data is so wacked that according to the PROBE script it was ususable (even after throwing out programs 2a and 3a). While this is somewhat discouraging, it's a powerful motivator to pay more attention to my development practices and attempt to better my ability to do work.

Using the historical data as an average (as per the PROBE script), I've come up with an estimate of 234 new/changed LOC for program 6a. This seems a bit high to me, but we'll see how it goes.

Resource/Time Estimate

My time data is getting a little more stable, and I was able to use linear regression to predict a total project time of about 256 minutes based on PROBE's estimate for size.

Development

Design

Our single_variable_function and simpson_integrator objects do some more work this round, as well as our old standby, the simple_input_parser. We'll create several new single_variable_function subclasses: gamma_function, t_distribution_base, and of course t_distribution. There's something peculiar about the last, because the t-distribution is not only dependent on x, but also on the "degrees of freedom" involved; I will get around this by adding a "degrees of freedom" feature to the t-distribution class, effectively currying the original 2-variable function into a single-variable function.

The only other new class with substance is the prediction_parser class, which will transform input strings by deleting everything from the inline-comment identifier ("--") to the end of the line, and then stripping whitespace. It will then handle strings based on a state: before the end of historical data, pairs will be added to a prediction_list, a child of the venerable paired_number_list from lesson 4a. At the end of historical data (signified by the word "stop" on a line by itself), the class will print data statistics (regression parameters, etc.). Further numbers will be used as "x" values for prediction, and will each result in a prediction based on the regression parameters and the t-distribution. Blank lines will be ignored.

Code

No large surprises here, although while constructing some items I was prompted to spin off some elements into separate reusable files/classes (like the extremely lightweight but somewhat useful "error_log" class).

Reused code

The simple_input_parser, simpson_integrator, number_list, paired_number_list, and single_variable_function classes were reused in full.

error_log

/*
*/

#ifndef ERROR_LOG_H
#define ERROR_LOG_H

#include <string>

class error_log
{
  public:void check_error (bool should_be_false, const std::string & message);
  void log_error (const std::string & message);
  void clear_error_flag (void);
  void set_error_flag (void);
  bool error_flag (void) const;
    error_log (void);

    protected:bool m_error_flag;
};

#endif

/*
*/
/*
*/

#include "error_log.h"

#include <string>

void
error_log::check_error (bool should_be_false, const std::string & message)
{
  if (should_be_false)
    {
      m_error_flag = true;
      log_error (message);
    }
}

void
error_log::log_error (const std::string & message)
{
  cerr << "Error: " << message << "\n";
}

void
error_log::clear_error_flag (void)
{
  m_error_flag = false;
}

void
error_log::set_error_flag (void)
{
  m_error_flag = true;
}

bool error_log::error_flag (void) const
{
  return m_error_flag;
}

error_log::error_log (void)
{
  clear_error_flag ();
}

/*
*/

gamma_function

#ifndef GAMMA_FUNCTION_H
#define GAMMA_FUNCTION_H

#ifndef SINGLE_VARIABLE_FUNCTION_H
#include "single_variable_function.h"
#endif

class gamma_function:public single_variable_function
{
  public:virtual double at (double x) const;
};

#endif
/*
*/

#include "gamma_function.h"

#ifndef IS_DOUBLE_EQUAL_H
#include "is_double_equal.h"
#endif

#include <math.h>

double
gamma_function::at (double x) const
{
  double
    Result = 0;
  if (is_double_equal (x, 1.0))
    {
      Result = 1;
    }
  else if (is_double_equal (x, 0.5))
    {
      Result = sqrt (M_PI);
    }
  else
    {
      Result = (x - 1.0) * at (x - 1.0);
    }
  return Result;
}

/*
*/

is_double_equal

#ifndef IS_DOUBLE_EQUAL_H
#define IS_DOUBLE_EQUAL_H

bool is_double_equal (const double lhs, const double rhs);

void set_is_double_equal_margin (const double new_margin);

#endif
/*
*/

#include "is_double_equal.h"

#include <math.h>

static double double_equal_margin = 0.00001;

bool is_double_equal (const double lhs, const double rhs)
{
  if (fabs (lhs - rhs) < double_equal_margin)
    {
      return true;
    }
  else
    {
      return false;
    }
}

void
set_double_equal_margin (const double new_margin)
{
  double_equal_margin = new_margin;
}

/*
*/

paired_number_list_predictor

#ifndef PAIRED_NUMBER_LIST_PREDICTOR_H
#define PAIRED_NUMBER_LIST_PREDICTOR_H

#ifndef PAIRED_NUMBER_LIST_H
#include "paired_number_list.h"
#endif
#ifndef T_DISTRIBUTIONL_H
#include "t_distribution.h"
#endif

class paired_number_list_predictor:public paired_number_list
{
  public:double variance (void) const;
  double standard_deviation (void) const;
  double projected_y (double x) const;
  double prediction_range (double x, double range) const;
  double lower_prediction_interval (double x, double range) const;
  double upper_prediction_interval (double x, double range) const;
  double t (double range) const;

    protected:t_distribution m_t_distribution;
  double prediction_range_base (void) const;
};

#endif
/*
*/

#include "paired_number_list_predictor.h"

#ifndef CONTRACT_H
#include "contract.h"
#endif

#include <math.h>

double
paired_number_list_predictor::variance (void) const
{
  REQUIRE (entry_count () > 2);
  double
    Result = 0;
  list < double >::const_iterator x_iter;
  list < double >::const_iterator y_iter;

  for (x_iter = m_xs.begin (), y_iter = m_ys.begin ();
       (x_iter != m_xs.end ()) && (y_iter != m_ys.end ()); ++x_iter, ++y_iter)
    {
      Result += pow (*y_iter - beta_0 () - beta_1 () * (*x_iter), 2);
    }
  Result *= 1.0 / (entry_count () - 2.0);
  return Result;
}

double
paired_number_list_predictor::standard_deviation (void) const
{
  return sqrt (variance ());
}

double
paired_number_list_predictor::projected_y (double x) const
{
  return beta_0 () + beta_1 () * x;
}

double
paired_number_list_predictor::t (double range) const
{
  const_cast <
    paired_number_list_predictor *
    >(this)->m_t_distribution.set_n (entry_count () - 2);
  return m_t_distribution.at (range);
}

double
paired_number_list_predictor::prediction_range (double x, double range) const
{
  REQUIRE (entry_count () > 0);
  const double
    a_t = t (range);
  const double
    dev = standard_deviation ();
  const double
    x_m = x_mean ();
  double
    ecount_inv = 1 / entry_count ();
  return t (range) * standard_deviation ()
    * sqrt (1.0 + 1.0 / static_cast < double >(entry_count ())
	    + pow (x - x_mean (), 2) / prediction_range_base ());
}

double
paired_number_list_predictor::lower_prediction_interval (double x,
							 double range) const
{
  return projected_y (x) - prediction_range (x, range);
}

double
paired_number_list_predictor::upper_prediction_interval (double x,
							 double range) const
{
  return projected_y (x) + prediction_range (x, range);
}

double
paired_number_list_predictor::prediction_range_base (void) const
{
  double
    Result = 0;
  for (std::list < double >::const_iterator x_iter = m_xs.begin ();
       x_iter != m_xs.end (); ++x_iter)
    {
      Result += pow ((*x_iter) - x_mean (), 2);
    }
  return Result;
}

/*
*/

t_distribution

#ifndef T_DISTRIBUTION_H
#define T_DISTRIBUTION_H

#ifndef SINGLE_VARIABLE_FUNCTION_H
#include "single_variable_function.h"
#endif

class t_distribution:public single_variable_function
{
  public:virtual double at (double x) const;
  void set_n (double new_n);
    t_distribution (void);

    protected:double n;
  double next_guess (double arg, double last_result, double target) const;
};

#endif
#include "t_distribution.h"
#ifndef T_INTEGRAL_H
#include "t_integral.h"
#endif

void
t_distribution::set_n (double new_n)
{
  n = new_n;
}

double
t_distribution::at (double x) const
{
  //x is given in a range; we need to convert this to a two-sided
  //distribution so we get what we're expecting
  x = 0.5 + x / 2;
  t_integral
    integral;
  integral.set_n (n);
  double
    last_error = 0;
  const double
    error_margin = 0.0000000001;
  double
    integral_arg = 0;
  double
    this_result = 0.5;
  double
    last_result = 0;
  bool
    has_tried_once = false;
  while (!has_tried_once || (last_error > error_margin))
    {
      last_result = this_result;
      this_result = integral.at (integral_arg);
      last_error = fabs (this_result - x);
      //store the argument in a temp (don't assign to last_arg,
      //because we need both values for the next guess!
      integral_arg = next_guess (integral_arg, last_result, x);
      has_tried_once = true;
    }
  return integral_arg;
}

t_distribution::t_distribution (void)
{
  set_n (0);
}

double
t_distribution::next_guess (double arg, double last_result, double target) const
{
  double
    Result = 0;
  Result = arg + (target - last_result);
  return Result;
}

t_distribution_base

#ifndef T_DISTRIBUTION_BASE_H
#define T_DISTRIBUTION_BASE_H

#ifndef SINGLE_VARIABLE_FUNCTION_H
#include "single_variable_function.h"
#endif

class t_distribution_base:public single_variable_function
{
  public:virtual double at (double x) const;
  void set_n (double new_n);
    t_distribution_base (void);
    protected:double n;
};

#endif
/*
*/

#include "t_distribution_base.h"
#ifndef CONTRACT_H
#include "contract.h"
#endif
#include <math.h>


double
t_distribution_base::at (double x) const
{
  REQUIRE (n > 0);
  return pow (1 + x * x / n, -(n + 1) / 2);
}

void
t_distribution_base::set_n (double new_n)
{
  n = new_n;
}

t_distribution_base::t_distribution_base (void):
n (0)
{
}


/*
*/

t_integral

#ifndef T_INTEGRAL_H
#define T_INTEGRAL_H

#ifndef SINGLE_VARIABLE_FUNCTION_H
#include "single_variable_function.h"
#endif
#ifndef GAMMA_FUNCTION_H
#include "gamma_function.h"
#endif
#ifndef SIMPSON_INTEGRATOR_H
#include "simpson_integrator.h"
#endif
#ifndef T_DISTRIBUTION_BASE_H
#include "t_distribution_base.h"
#endif

class t_integral:public single_variable_function
{
  public:virtual double at (double x) const;
  void set_n (double new_n);
    t_integral (void);

    protected:simpson_integrator homer;
  gamma_function gamma;
  t_distribution_base base;
  double n;
  double multiplier (void) const;
};

#endif
/*
*/

#include "t_integral.h"
#ifndef CONTRACT_H
#include "contract.h"
#endif

#include <math.h>


double
t_integral::multiplier (void) const
{
  return gamma.at ((n + 1) / 2) / (sqrt (n * M_PI) * gamma.at (n / 2));
}

void
t_integral::set_n (double new_n)
{
  n = new_n;
  base.set_n (new_n);
}

t_integral::t_integral (void)
{
  set_n (0);
}

double
t_integral::at (double x) const
{
  REQUIRE (n > 0);
  double
    Result = 0.5;
  if (x < 0)
    {
      Result = 0.5 - multiplier () * homer.integral (base, 0, -x);
    }
  else if (x > 0)
    {
      Result = 0.5 + multiplier () * homer.integral (base, 0, x);
    }
  return Result;
}

/*
*/

main.cpp

/*
*/

#include <fstream>
#include <iostream>
#include "string.h"

#ifndef PREDICTOR_PARSER_H
#include "predictor_parser.h"
#endif

istream *
input_stream_from_args (int arg_count, const char **arg_vector)
{
  istream *Result = NULL;
  if (arg_count == 1)
    {
      Result = &cin;
    }
  else
    {
      const char *help_text =
	"PSP exercise 6A: Calculate a prediction and interval given historical data.\nUsage:\n\tpsp_5a\n\n";
      cout << help_text;
    }
  return Result;
}

int
main (int arg_count, const char **arg_vector)
{
  //get the input stream, or print the help text as appropriate
  istream *input_stream = input_stream_from_args (arg_count, arg_vector);
  if (input_stream != NULL)
    {
      predictor_parser parser;
      parser.set_input_stream (input_stream);
      parser.parse_until_eof ();
    }
}

/*
*/

error_log.e

class ERROR_LOG
--very rudimentary error reporting and error flag tracking.

creation {ANY} 
   make

feature {ANY} 
   
   check_for_error(should_be_false: BOOLEAN; message: STRING) is 
      do  
         if should_be_false then 
            log_error(message);
         end; 
      end -- check_for_error
   
   log_error(message: STRING) is 
      do  
         std_error.put_string(message + "%N");
         set_error_flag;
      end -- log_error
   
   clear_error_flag is 
      do  
         error_flag := false;
      end -- clear_error_flag
   
   set_error_flag is 
      do  
         error_flag := true;
      end -- set_error_flag
   
   error_flag: BOOLEAN;
   
   make is 
      do  
         clear_error_flag;
      end -- make

end -- class ERROR_LOG

gamma_function.e

class GAMMA_FUNCTION
--gamma function, one of the bases of the t-distribution

inherit 
   SINGLE_VARIABLE_FUNCTION
      redefine at
      end; 
   
feature {ANY} 
   
   error_margin: DOUBLE is 0.000001;
   
   acceptably_equal(lhs, rhs: DOUBLE): BOOLEAN is 
      --is lhs within error_margin of rhs?
      do  
         Result := false;
         if lhs.in_range(rhs - error_margin,rhs + error_margin) then 
            Result := true;
         end; 
      end -- acceptably_equal
   
   at(x: DOUBLE): DOUBLE is 
      --gamma function
      do  
         if acceptably_equal(x,1.0) then 
            Result := 1;
         elseif acceptably_equal(x,0.5) then 
            Result := Pi.sqrt;
         else 
            Result := (x - 1.0) * at(x - 1.0);
         end; 
      end -- at

end -- class GAMMA_FUNCTION

main.e

class MAIN

creation {ANY} 
   make

feature {ANY} 
   
   make is 
      local 
         parser: PREDICTOR_PARSER;
         gamma: GAMMA_FUNCTION;
      do  
         !!parser.make;
         parser.set_input(io);
         parser.parse_until_eof;
      end -- make

end -- MAIN

paired_number_list_predictor.e

class PAIRED_NUMBER_LIST_PREDICTOR
--reads a set of paired numbers, does linear regression, predicts results

inherit 
   PAIRED_NUMBER_LIST
      redefine make
      end; 
   
creation {ANY} 
   make

feature {ANY} 
   
   variance: DOUBLE is 
      local 
         i: INTEGER;
      do  
         Result := 0;
         from 
            i := xs.lower;
         until 
            not (xs.valid_index(i) and ys.valid_index(i))
         loop 
            Result := Result + (ys.item(i) - beta_0 - beta_1 * xs.item(i)) ^ 2;
            i := i + 1;
         end; 
         Result := Result / (entry_count - 2);
      end -- variance
   
   standard_deviation: DOUBLE is 
      do  
         Result := variance.sqrt;
      end -- standard_deviation
   
   projected_y(x: DOUBLE): DOUBLE is 
      --projected value of given x, using linear regression 
      --parameters from xs and ys
      do  
         Result := beta_0 + beta_1 * x;
      end -- projected_y
   
   prediction_range_base: DOUBLE is 
      --base of the prediction range, used in prediction_range
      local 
         i: INTEGER;
      do  
         Result := 0;
         from 
            i := xs.lower;
         until 
            not (xs.valid_index(i) and ys.valid_index(i))
         loop 
            Result := Result + (xs.item(i) - xs.mean) ^ 2;
            i := i + 1;
         end; 
      end -- prediction_range_base
   
   prediction_range(x, range: DOUBLE): DOUBLE is 
      --prediction range, based on given estimate and % range
      require 
         entry_count > 0; 
      do  
         Result := (1.0 + (1.0 / entry_count.to_double) + (((x - xs.mean) ^ 2) / prediction_range_base)).sqrt;
         Result := t(range) * standard_deviation * Result;
      end -- prediction_range
   
   lower_prediction_interval(x, range: DOUBLE): DOUBLE is 
      --LPI, from [Humphrey95]
      do  
         Result := projected_y(x) - prediction_range(x,range);
      end -- lower_prediction_interval
   
   upper_prediction_interval(x, range: DOUBLE): DOUBLE is 
      --UPI, from [Humphrey95]
      do  
         Result := projected_y(x) + prediction_range(x,range);
      end -- upper_prediction_interval
   
   t_distribution: T_DISTRIBUTION;
   
   make is 
      do  
         Precursor;
         !!t_distribution.make;
      end -- make
   
   t(range: DOUBLE): DOUBLE is 
      --gets the size of the t-distribution at the given alpha range
      do  
         t_distribution.set_n(entry_count - 2);
         Result := t_distribution.at(range);
      end -- t

end -- class PAIRED_NUMBER_LIST_PREDICTOR

predictor_parser.e

class PREDICTOR_PARSER
--reads a list of number pairs, and performs linear regression analysis

inherit 
   SIMPLE_INPUT_PARSER
      redefine parse_last_line, transformed_line
      end; 
   
creation {ANY} 
   make

feature {ANY} 
   
   inline_comment_begin: STRING is "--";
   
   string_stripped_of_comment(string: STRING): STRING is 
      --strip the string of any comment
      local 
         comment_index: INTEGER;
      do  
         if string.has_string(inline_comment_begin) then 
            comment_index := string.index_of_string(inline_comment_begin);
            if comment_index = 1 then 
               Result := "";
            else 
               Result := string.substring(1,comment_index - 1);
            end; 
         else 
            Result := string;
         end; 
      end -- string_stripped_of_comment
   
   string_stripped_of_whitespace(string: STRING): STRING is 
      --strip string of whitespace
      do  
         Result := string;
         Result.left_adjust;
         Result.right_adjust;
      end -- string_stripped_of_whitespace
   
   transformed_line(string: STRING): STRING is 
      --strip comments and whitespace from parseable line      
      do  
         Result := string_stripped_of_whitespace(string_stripped_of_comment(string));
      end -- transformed_line
   
   number_list: PAIRED_NUMBER_LIST_PREDICTOR;

feature {ANY} --parsing

   found_end_of_historical_data: BOOLEAN;
   
   reset is 
      --resets the parser and makes it ready to go again
      do  
         found_end_of_historical_data := false;
         number_list.reset;
      end -- reset
   
   make is 
      do  
         !!number_list.make;
         reset;
      end -- make
   
   parse_last_line_as_historical_data is 
      --interpret last_line as a pair of comma-separated values
      local 
         error_log: ERROR_LOG;
         comma_index: INTEGER;
         x_string: STRING;
         y_string: STRING;
         new_x: DOUBLE;
         new_y: DOUBLE;
      do  
         !!error_log.make;
         comma_index := last_line.index_of(',');
         error_log.check_for_error(comma_index = last_line.count + 1,"No comma:" + last_line);
         x_string := last_line.substring(1,comma_index - 1);
         y_string := last_line.substring(comma_index + 1,last_line.count);
         error_log.check_for_error(not (x_string.is_double or x_string.is_integer),"invalid X:" + last_line);
         error_log.check_for_error(not (y_string.is_double or y_string.is_integer),"invalid Y:" + last_line);
         if not error_log.error_flag then 
            new_x := double_from_string(x_string);
            new_y := double_from_string(y_string);
            number_list.add_entry(new_x,new_y);
            std_output.put_string("added: ");
            std_output.put_double(new_x);
            std_output.put_string(", ");
            std_output.put_double(new_y);
            std_output.put_new_line;
         end; 
      end -- parse_last_line_as_historical_data
   
   double_from_string(string: STRING): DOUBLE is 
      require 
         string.is_double or string.is_integer; 
      do  
         if string.is_double then 
            Result := string.to_double;
         elseif string.is_integer then 
            Result := string.to_integer.to_double;
         end; 
      end -- double_from_string
   
   historical_data_terminator: STRING is "stop";
   
   parse_last_line_as_end_of_historical_data is 
      --interpret last line as the end of historical data
      require 
         last_line.compare(historical_data_terminator) = 0; 
      do  
         found_end_of_historical_data := true;
         std_output.put_string("Historical data read.%NBeta-0: ");
         std_output.put_double(number_list.beta_0);
         std_output.put_string("%NBeta-1: ");
         std_output.put_double(number_list.beta_1);
         std_output.put_string("%NStandard Deviation: ");
         std_output.put_double(number_list.standard_deviation);
         std_output.put_string("%N%N");
      end -- parse_last_line_as_end_of_historical_data
   
   parse_last_line_as_prediction is 
      --interpret last line as a single x, for a predictive y
      local 
         error_log: ERROR_LOG;
         x: DOUBLE;
      do  
         !!error_log.make;
         error_log.check_for_error(not (last_line.is_double or last_line.is_integer),"Not a double : " + last_line);
         if not error_log.error_flag then 
            x := double_from_string(last_line);
            std_output.put_string("Estimate at x=");
            std_output.put_double(x);
            std_output.put_string("%NProjected y: ");
            std_output.put_double(number_list.projected_y(x));
            std_output.put_string("%Nt (70 percent): ");
            std_output.put_double(number_list.t(0.7));
            std_output.put_string("%Nt (90 percent): ");
            std_output.put_double(number_list.t(0.9));
            std_output.put_string("%NRange (70 percent): ");
            std_output.put_double(number_list.prediction_range(x,0.7));
            std_output.put_string("; UPI: ");
            std_output.put_double(number_list.upper_prediction_interval(x,0.7));
            std_output.put_string("; LPI: ");
            std_output.put_double(number_list.lower_prediction_interval(x,0.7));
            std_output.put_string("%NRange (90 percent): ");
            std_output.put_double(number_list.prediction_range(x,0.9));
            std_output.put_string("; UPI: ");
            std_output.put_double(number_list.upper_prediction_interval(x,0.9));
            std_output.put_string("; LPI: ");
            std_output.put_double(number_list.lower_prediction_interval(x,0.9));
            std_output.put_new_line;
         end; 
      end -- parse_last_line_as_prediction
   
   parse_last_line is 
      --parse the last line according to state
      do  
         if not last_line.empty then 
            if last_line.compare(historical_data_terminator) = 0 then 
               parse_last_line_as_end_of_historical_data;
            else 
               if found_end_of_historical_data then 
                  parse_last_line_as_prediction;
               else 
                  parse_last_line_as_historical_data;
               end; 
            end; 
         end; 
      end -- parse_last_line

end -- class PREDICTOR_PARSER

t_distribution.e

class T_DISTRIBUTION
--the t-distribution, used to find prediction ranges, etc.

inherit 
   SINGLE_VARIABLE_FUNCTION
      redefine at
      end; 
   
creation {ANY} 
   make

feature {ANY} 
   
   n: INTEGER;
   
   set_n(new_n: INTEGER) is 
      do  
         n := new_n;
      end -- set_n
   
   make is 
      do  
         set_n(0);
      end -- make
   
   next_guess(arg, last_result, target: DOUBLE): DOUBLE is 
      do  
         Result := arg + (target - last_result);
      end -- next_guess
   
   error_margin: DOUBLE is 0.0000000001;
   
   at(x: DOUBLE): DOUBLE is 
      local 
         t_integral: T_INTEGRAL;
         last_error: DOUBLE;
         this_result: DOUBLE;
         last_result: DOUBLE;
         has_tried_once: BOOLEAN;
         target: DOUBLE;
      do  
         --convert the range to a two-sided distribution so we get 
         --what we're expecting
         target := 0.5 + x / 2;
         !!t_integral.make;
         t_integral.set_n(n);
         last_error := 0;
         from 
            Result := 0;
            has_tried_once := false;
         until 
            has_tried_once and error_margin > last_error
         loop 
            last_result := this_result;
            this_result := t_integral.at(Result);
            last_error := (this_result - target).abs;
            Result := next_guess(Result,last_result,target);
            has_tried_once := true;
         end; 
      end -- at

end -- class T_DISTRIBUTION

t_distribution_base.e

class T_DISTRIBUTION_BASE
--simple function as the base of the t-distribution integral

inherit 
   SINGLE_VARIABLE_FUNCTION
      redefine at
      end; 
   
creation {ANY} 
   make

feature {ANY} 
   
   n: INTEGER;
      --degrees of freedom
   
   set_n(new_n: INTEGER) is 
      do  
         n := new_n;
      end -- set_n
   
   make is 
      do  
         set_n(0);
      end -- make
   
   at(x: DOUBLE): DOUBLE is 
      require 
         n > 0; 
      do  
         Result := ((1.0 + x * x / n.to_double) ^ - (n + 1)).sqrt;
      end -- at

end -- class T_DISTRIBUTION_BASE

t_integral.e

class T_INTEGRAL
--integral of the t-distribution up to a given point; NOT what you 
--use to make predictions; use T_DISTRIBUTION instead

inherit 
   SINGLE_VARIABLE_FUNCTION
      redefine at
      end; 
   
creation {ANY} 
   make

feature {ANY} 
   
   n: INTEGER;
   
   set_n(new_n: INTEGER) is 
      do  
         n := new_n;
         t_distribution_base.set_n(new_n);
      end -- set_n
   
   make is 
      do  
         !!gamma;
         !!homer.make;
         !!t_distribution_base.make;
         set_n(0);
      end -- make
   
   multiplier: DOUBLE is 
      do  
         Result := gamma.at((n + 1) / 2) / ((n * Pi).sqrt * gamma.at(n / 2));
      end -- multiplier
   
   gamma: GAMMA_FUNCTION;
   
   homer: SIMPSON_INTEGRATOR;
   
   t_distribution_base: T_DISTRIBUTION_BASE;
   
   at(x: DOUBLE): DOUBLE is 
      do  
         Result := 0.5;
         if x < 0 then 
            Result := 0.5 - multiplier * homer.integral(t_distribution_base,0,- x);
         else 
            Result := 0.5 + multiplier * homer.integral(t_distribution_base,0,x);
         end; 
      end -- at

end -- class T_INTEGRAL

Compile

My continual problems abound: using the incorrect style of assignments for the wrong language, loop increments and instantiation in Eiffel, and header/source parity in C++. The C++ standard library continues to surprise me, which is not a good thing (during the course of this, previous programs "stopped working", not recognizing EOF in data files; the "solution" was to read and put back a character after each call to iostream::getline(), which changed nothing except that EOF was now detected. Strange).

Test

Despite the obvious disclaimer in the problem statement ("Note that to calculate the value of t, you integrate from 0 to a trial value of t. You find the correct value by successively adjusting the trial value of t up or down until the p value is within an acceptable error..." [Humphrey95]), I didn't understand what this meant until the program appeared horribly broken. Figuring this out took a great deal of time and resulted in a huge "design defect" in the test phase.

Table 6-3. Test Results Format -- Program 6a:

TestParameterExpected ValueActual Value- C++Actual Value- Eiffel
Table D8Beta0-22.55-22.5525-22.552533
 Beta11.72791.727931.727932
 UPI( 70% )874874.431874.430261
 LPI( 70% )414414.428414.428507
 UPI( 90% )10301030.391030.387465
 LPI( 90% )258258.47258.471303
Program 6a, using historical data for programs 2a through 6aEstimated new and changed LOC   
 UPI( 70% )N/A400.809400.809121
 LPI( 70% )N/A166.415166.414689
 UPI( 90% )N/A504.297504.297205
 LPI( 90% )N/A62.926692.926605
 Actual new/changed LOCN/A299299

Postmortem

PSP1.1 Project Plan Summary

Table 6-4. Project Plan Summary

Student:Victor B. PutzDate:000111
Program:Statistic PredictorProgram#6A
Instructor:WellsLanguage:C++
SummaryPlanActualTo date
Loc/Hour 485649
Planned time256 256
Actual time 315315
CPI (cost/performance index)  0.813
%reused444423
Program SizePlanActualTo date
Base1717 
Deleted22 
Modified32 
Added231297 
Reused233254396
Total New and Changed2342991089
Total LOC4425681712
Total new/reused000
Time in Phase (min):PlanActualTo DateTo Date%
Planning465824118
Design262713010
Code708335127
Compile1524897
Test7910842031
Postmortem2015927
Total2563151323100
Defects Injected ActualTo DateTo Date %
Plan 000
Design 174230
Code 239366
Compile 032.5
Test 032.5
Total development 40141100
Defects Removed ActualTo DateTo Date %
Planning 000
Design 000
Code 93122
Compile 227150
Test 93928
Total development 40141100
After Development 00 
Eiffel code/compile/test
Time in Phase (min)ActualTo DateTo Date %
Code7121749
Compile3010624
Test2211727
Total123440100
Defects InjectedActualTo DateTo Date %
Design044
Code288695
Compile000
Test011
Total2891100
Defects RemovedActualTo DateTo Date %
Code011
Compile226167
Test62932
Total2891100

Time Recording Log

Table 6-5. Time Recording Log

Student:Victor B. PutzDate:000109
  Program:6A
StartStopInterruption TimeDelta timePhaseComments
000109 15:09:59000109 16:36:012759plan 
000109 16:36:04000109 17:07:13427design 
000110 08:50:12000110 10:13:35083code 
000110 10:18:57000110 10:42:55023compile 
000110 10:47:38000110 12:36:421108test 
000111 11:55:19000111 12:10:28015postmortem 
      

Table 6-6. Time Recording Log

Student:Victor B. PutzDate:000111
  Program:6a
StartStopInterruption TimeDelta timePhaseComments
000111 08:53:31000111 10:04:22070code 
000111 10:04:25000111 10:34:02029compile 
000111 10:34:27000111 10:56:03021test 
      

Defect Reporting Logs

Table 6-7. Defect Recording Log

Student:Victor B. PutzDate:000109
  Program:6A
Defect foundTypeReasonPhase InjectedPhase RemovedFix timeComments
000110 08:55:43mdigdesigncode2created is_double_equal to handle margins for double comparison (since pure equality is problematical)
000110 09:02:48mdigdesigncode2forgot to add n to the class and equation
000110 09:18:45mdigdesigncode3added variance feature to simplify standard deviation
000110 09:33:09mdigdesigncode1added double_from_string member
000110 09:36:23mdcmdesigncode3broke string_stripped_of_comments into its own method; broke string_stripped_of_whitespace into its own class
000110 09:47:51mdcmdesigncode1broke error_log into its own class
000110 09:52:50mdigdesigncode2beefed up error_log to include flag member, etc
000110 10:00:18mdigdesigncode1added last_line_is_blank member
000110 10:05:40icigdesigncode1moved t() into its own feature instead of mucking about with t_integral, etc.
000110 10:21:19sytycodecompile0erroneously typed compiler directive
000110 10:22:32sytycodecompile0forgot to #include necessary header
000110 10:23:12sytycodecompile0forgot to make inherited feature "at" const
000110 10:23:59sytycodecompile0forgot to #include string in header
000110 10:24:44wncmcodecompile1misnamed a feature
000110 10:26:18sycmcodecompile0used const double&; meant to use const double
000110 10:29:12syomcodecompile0forgot-- must declare pairs of iterators prior to for-loop
000110 10:30:17wncmcodecompile0used t_integral instead of m_t_integral as feature name
000110 10:31:51syigcodecompile0const cast games...
000110 10:33:16wncmcodecompile0wrong name; typed "n" instead of entry_count
000110 10:33:56syomcodecompile0forgot to add declaration to for-loop
000110 10:34:44sytycodecompile0forgot to add parentheses to zero-argument method
000110 10:35:21sytycodecompile0accidentally added semicolon after parentheses in implementation
000110 10:36:07syomcodecompile0forgot to #include "contract.h"
000110 10:36:38syomcodecompile0forgot to qualify methods with object (beta-0, beta-1, etc)
000110 10:37:40sytycodecompile0forgot to add error message to check_error call
000110 10:38:25wncmcodecompile0misnamed argument in is_double
000110 10:38:58sytycodecompile0accidentally added semicolon after initializer in constructor
000110 10:39:38syomcodecompile0forgot to #include math.h
000110 10:40:31syomcodecompile0const games (base...)
000110 10:41:20sytycodecompile0used eiffel-style assignment; typo in constructor as well
000110 10:42:06sytycodecompile0const games...
000110 10:47:41mdigdesigntest1forgot to add constructor
000110 10:49:25wccmdesigntest3some confusion with whether or not the condition in "check_error" should be true or false to trigger the log
000110 10:54:14weigdesigntest1forgot that stating "1/(entry_count() - 2 )" used ints instead of doubles.
000110 10:57:48wcigdesigntest2another true/false issue with check_error
000110 11:00:22wtcmcodetest9Aw, geez... accidentally declared return type "result" as bool, with disastrous results
000110 11:11:39waigdesigntest18missed where the multiplier went in the t integral-- was using it to the whole integral (after the 0.5 business)
000110 11:32:23waigdesigntest47holy cow-- missed entirely how to do the t-distribution; now have the correct one in place.
000110 12:22:54waigdesigntest2expedited the "next guess" algorithm
000110 12:26:46weigdesigntest8bloody c++... was silently doing int arithmetic in the range calculation instead of converting to double
       

Table 6-8. Defect Recording Log

Student:Victor B. PutzDate:000111
  Program:6a
Defect foundTypeReasonPhase InjectedPhase RemovedFix timeComments
000111 10:05:05wnigcodecompile0used "equals" instead of "compare" to compare strings
000111 10:06:03sycmcodecompile0used () at end of no-argument feature call
000111 10:06:41sytycodecompile0typed beta-1 instead of beta_1
000111 10:07:25sytycodecompile0used comma instead of semicolon as argument separator
000111 10:07:47sytycodecompile0compiler didn't like a semicolon; odd
000111 10:08:33sytycodecompile0used c_style = for assignment
000111 10:09:05syigcodecompile0can't declare constants in local header
000111 10:10:43syigcodecompile0tried to change a formal argument; not allowed in Eiffel
000111 10:11:10sytycodecompile0forgot "then" after if
000111 10:11:33sytycodecompile0typed "homer_integral" instead of homer.integral
000111 10:12:00sytycodecompile0typed x.mean instead of xs.mean
000111 10:12:36wncmcodecompile0used range instead of prediction_range
000111 10:13:55sytycodecompile0used _ instead of . for feature call
000111 10:14:31sytycodecompile0forgot to qualify number_list feature call with an object
000111 10:14:55sytycodecompile0forgot to qualify error_log feature call with object
000111 10:15:22icomcodecompile0forgot to add double_from_string feature (reused from readable_paired_number_list)
000111 10:16:56wtomcodecompile0mistakenly declared some variables as integer, not double
000111 10:17:38wncmcodecompile0used check_error instead of check_for_error
000111 10:19:48mcomcodecompile0forgot to add call to predictor_parser.make
000111 10:20:38mcomcodecompile2was not instantiating some members
000111 10:23:19mcomcodecompile1forgot several instantiations
000111 10:24:48waigcodecompile8Eiffel does not support arbitrary exponents, only integers-- so I had to change the alg to use a sqrt
000111 10:35:10waigcodetest3forgot to check for cases in which the line begins with the comment...?
000111 10:39:39wacmcodetest1forgot to increment loop index
000111 10:41:53macmcodetest1forgot to update found_end_of_historical_data in parse_last_line_as_end_of_historical_data
000111 10:43:25mccmcodetest0forgot to instantiate error_log
000111 10:44:14wncmcodetest0tried to use to_double instead of double_from_string
000111 10:45:48watycodetest9missed a minus sign in a calculation. Dang.