Program 1A

Using a linked list, write a program to calculate the mean and standard deviation of a set of data.

Given Requirements

From [Humphrey95]:

Write a program to calculate the mean and standard deviation of a series of n real numbers. The mean is the average of the numbers. The formula for the standard deviation is given in [Humphrey95], and is described on page 508 of the text.

Use a linked list to hold the n numbers for the calculations.

Program 1A Testing: Thoroughly test the program. At least three of the tests should use the data in each of the three columns on the right of the following table:

Table 1-5. Sample data for program 1A: Pascal Object Size and Development Hours

Program NumberObject LOCNew and Changed LOCDevelopment Hours
116018615.0
259169969.9
31141326.5
422927222.4
523029128.4
627033165.9
712819919.4
816571890198.7
962478838.8
1015031601138.2
Standard Dev572.03625.6362.26

Planning

Program Requirements

The basic problem description does not give us a great deal of direction, which means we're free to work as we see fit. With that in mind, I'll iron out a few requirements ambiguities.

The problem statement does not specify how we get numbers into the program; with that in mind, I'll decide that the program by default accepts a list of ascii-text real numbers from standard input, terminated by an EOF. This will allow both interactive input for testing as well as redirection via standard unix pipes, etc. We will also allow the input to be retrieved from a file using the -f flag on the command line, i.e. psp_1a -f numbers.txt.

The problem statement does not specify what to do in the case of invalid input; since we will be using the program primarily for automated i/o on files, we will treat invalid input as unrecoverable; the program should display an error message with the offending line number and filename, if applicable. Since I am developing under Linux with EMACS, the error message format will be in the standard emacs-style error output of file:line:message.

Estimate Resources

Without any other guidance than a guess, I'll estimate 1.5 hours (90 minutes) for this program.

Development

Design

The basic design is fairly simple, with a single class (number_list), which inherits from the STL template class list< double >. After subclassing list< double >, we add two features for reading the list from an input stream (read_from_stream and read_entry_from_stream), one feature (add_entry) for simply adding a number to the list (to encapsulate some of the STL functionality), and several read-only constant attributes: sum (the sum of all elements), mean, standard_deviation, and entry_count (to count the number of entries, again just encapsulating some STL functionality)

Code/Compile/Test: C++

Code

The code was straightforward, although it went through several iterations in the test phase before turning into anything useful. The completed code is below:

number_list.h
/*
*/

#ifndef NUMBER_LIST_H
#define NUMBER_LIST_H

#include <list>
#include <iostream>

//a class which encapsulates a list of double values, adding the features of
//mean and standard deviation
class number_list : public list< double >
{
 public:
  void read_from_stream( istream& input );
  void read_entry_from_stream( istream& input, int line_number );
  double sum( void ) const;
  double mean( void ) const;
  double standard_deviation( void ) const;
  int entry_count( void ) const;
  void add_entry( double new_entry );
};

#endif

/*
*/
number_list.cpp
/*
*/

#include "number_list.h"
#include <assert.h>
#include <stdlib.h>
#include <math.h>

void 
number_list::read_from_stream( istream& input )
{
  int line_number = 1;
  while ( !(input.eof()) )
    {
      read_entry_from_stream( input, line_number );
    }
}

void 
number_list::read_entry_from_stream( istream& input, int line_number )
{
  double next_entry;
  const int input_buffer_size = 255;
  char input_buffer[ input_buffer_size ];
  input.getline( input_buffer, input_buffer_size );
  //now convert to floating point using strtod
  char* conversion_end = NULL;
  next_entry = strtod( input_buffer, &conversion_end );
  if ( conversion_end != input_buffer )
    {
      add_entry( next_entry );
    }
  else if ( !(input.eof()) )
    {
      cerr << "(unknown file):" << line_number << ":Error in conversion to double\n";
    }
}

double 
number_list::sum( void ) const
{
  double Result = 0;
  for ( list<double>::const_iterator iter = begin();
	iter != end();
	++iter )
    {
      Result += *iter;
    }
  return Result;
}

double 
number_list::mean( void ) const
{
  assert( entry_count() > 0 );
  return sum() / entry_count();
}

double 
number_list::standard_deviation( void ) const
{
  assert( entry_count() > 1 );
  double sum_of_square_differences = 0;
  for ( list<double>::const_iterator iter = begin();
	iter != end();
	++iter )
    {
      const double this_square_difference = *iter - mean();
      sum_of_square_differences += this_square_difference * this_square_difference;
    }
  return sqrt( sum_of_square_differences / ( entry_count() - 1 ));
}

int 
number_list::entry_count( void ) const
{
  return size();
}

void 
number_list::add_entry( double new_entry )
{
  push_back( new_entry );
}

/*
*/
main.cpp
/*
*/

#include <fstream>
#include <iostream>
#include "string.h"

#ifndef NUMBER_LIST_H
#include "number_list.h"
#endif

istream *
input_stream_from_args( int arg_count, const char** arg_vector )
{
  istream* Result = NULL;
  if ( arg_count == 1 )
    {
      Result = &cin;
    }
  else 
    {
      const char* help_text = "\
PSP exercise 1A: Get the mean and standard deviation of a list\n
of ASCII real numbers, from either standard input or a text\n
file.\n\n\
Usage: \n \
\tpsp_1a\n\n";
      cout << help_text;
    }
  return Result;
}

int main( int arg_count, const char** arg_vector )
{
  //get the input stream, or print the help text as appropriate
  istream* input_stream = input_stream_from_args( arg_count, arg_vector );
  if ( input_stream != NULL ) 
    {
      //read the entries from the input stream
      number_list a_list;
      a_list.read_from_stream( *input_stream );
      cout << a_list.entry_count() << " numbers processed.\n";
      //output the mean, as appropriate
      if ( a_list.entry_count() > 0 )
	{
	  cout << "Mean: " << a_list.mean() << "\n";
	}
      else
	{
	  cout << "Too few entries to calculate mean\n";
	}
      //output the standard deviation, as appropriate
      if ( a_list.entry_count() > 1 )
	{
	  cout << "Standard Deviation: " << a_list.standard_deviation() << "\n";
	}
      else
	{
	  cout << "Too few entries to calculate standard deviation.\n";
	}
    }
}

/*
*/

Compile

The compilation did reveal a few errors, mostly mistyped names and developer omissions (forgetting to include needed header files, etc). Otherwise, no significant problems here.

Test

The design and code sections for this went so easily that I confess I was disappointed by the test results. I made several simple mistakes, but these mistakes formed the bulk of my time spent developing what should have been a very simple program. Some defects were minor and easy to fix-- for example, the number_list::read_entry_from_stream feature added an entry even for an invalid line, even processing the EOF signal as a number. More significantly, the original design called for a naive use of the C++ iostream capability, in other words, usage like this:

while ( !a_stream.eof() )
   {
      a_stream >> new_entry;
   }

The only problem with this is that apparently it will never generate an EOF signal at all, a fact of which I wasn't aware. I converted that section to use getline, which worked acceptably.

The most significant problem, though, resulted in some deviations from the original design spec: I was unable to use the standard C++ library stream classes to open a file from disk! In fact, even creating the input stream object resulted in a segment fault interrupt:

ifstream* a_stream = new ifstream()

In fairness to myself, it has been some time since I've used the standard C++ stream library, but this behaviour did not mesh with my previous experience. A brief experiment with statically allocated streams had similar results when I attempted to check for EOF conditions.

I'll pursue this in the mean time, to check whether or not my unfamiliarity with the library resulted in these problems or if my compiler is simply acting peculiar. C++ is not always well supported in Linux, and I have had problems with their standard library implementation in the past. For the present, though, I will stick with standard input.

The program was tested with three files which contained the sample data from [Humphrey95], and one sample file with errors to check the error-reporting capability.

Code/Compile/Test: Eiffel

Code

number_list.e
class NUMBER_LIST

inherit 
   LINKED_LIST[DOUBLE];
   
creation {ANY} 
   make

feature {ANY} 
   
   line_counter: INTEGER;
   
   read_from_stream(input: INPUT_STREAM) is 
      do  
         line_counter := 0;
         from 
         until 
            input.end_of_input
         loop 
            read_entry_from_stream(input);
         end; 
      end -- read_from_stream
   
   report_error(error_type: STRING) is 
      do  
         std_error.put_string("(unknown file):");
         std_error.put_integer(line_counter);
         std_error.put_string(error_type);
         std_error.put_new_line;
      end -- report_error
   
   read_entry_from_stream(input: INPUT_STREAM) is 
      do  
         input.read_line;
         if not input.end_of_input then 
            if input.last_string.is_double then 
               add_last(input.last_string.to_double);
            elseif input.last_string.is_integer then 
               add_last(input.last_string.to_integer.to_double);
            else 
               report_error("Not a double");
            end; 
         end; 
      end -- read_entry_from_stream
   
   sum: DOUBLE is 
      local 
         i: INTEGER;
      do  
         from 
            i := lower;
         until 
            not valid_index(i)
         loop 
            Result := Result + item(i);
            i := i + 1;
         end; 
      end -- sum
   
   mean: DOUBLE is 
      require 
         count > 0; 
      do  
         Result := sum / count.to_double;
      end -- mean
   
   standard_deviation: DOUBLE is 
      require 
         count > 1; 
      local 
         i: INTEGER;
         this_term_difference: DOUBLE;
         top_term: DOUBLE;
      do  
         from 
            i := lower;
         until 
            not valid_index(i)
         loop 
            this_term_difference := item(i) - mean;
            top_term := top_term + this_term_difference ^ 2;
            i := i + 1;
         end; 
         Result := (top_term / (count - 1)).sqrt;
      end -- standard_deviation

end -- NUMBER_LIST
main.e
class MAIN

creation
   make
   
feature { ANY }
   
   make is
      local
	 number_list : NUMBER_LIST
      do
	 !!number_list.make
	 number_list.read_from_stream( io )
	 --now output results
	 if ( number_list.count > 0 ) then
	    std_output.put_string( "Mean: " )
	    std_output.put_double( number_list.mean )
	 else
	    std_output.put_string( "Not enough entries for mean" )   
	 end --if
	 std_output.put_new_line
	 if ( number_list.count > 1 ) then
	    std_output.put_string( "Standard Deviation: " )
	    std_output.put_double( number_list.standard_deviation )
	 else
	    std_output.put_string( "Not enough entries for standard deviation" )   
	 end --if
	 std_output.put_new_line
      end
   
end-- MAIN

Postmortem

A fairly simple tallying of time spent in phases and defects injected and removed. The forms used are listed in the following sections

PSP0 Project Plan Summary

Table 1-6. Project Plan Summary

Student:Victor B. PutzDate:991213
Program:Standard DeviationProgram#1A
Instructor:WellsLanguage:C++
Time in Phase (min):PlanActualTo DateTo Date%
Planning 1414100
Design 1717100
Code 2424100
Compile 55100
Test 5151100
Postmortem 33100
Total90114114100
Defects Injected ActualTo DateTo Date %
Design 00100
Code 88100
Compile 11100
Test 33100
Total development 1212100
Defects Removed ActualTo DateTo Date %
Planning 00100
Design 00100
Code 00100
Compile 77100
Test 55100
Total development 1212100
After Development 1212 
Eiffel code/compile/test
Time in Phase (min)ActualTo DateTo Date %
Code1545100
Compile1030100
Test825100
Total3333100
Defects InjectedActualTo DateTo Date %
Code66100
Compile000
Test000
Total66100
Defects RemovedActualTo DateTo Date %
Code000
Compile3350
Test3350
Total66100

Time Recording Log

Table 1-7. Time Recording Log

Student:Victor B. PutzDate:991219
Instructor:WellsProgram#1A
DateStartStopInterruption timeDelta timePhaseComments
99121315581612 14plan 
 16121622 10designSetting up environment (automake etc)
 16231630 7designClass design, etc.
 16301654 24code  
 1655170115compile1 interruption (phone call)
 17021753 51testproblems with "new ifstream" construct
 18281831 3postmortemdoes not include sgml writeups

Defect Reporting Log

Table 1-8. Defect Reporting Log

Student:Victor B. PutzDate:991219
Instructor:WellsProgram#1A
DateNumberTypeInjectRemoveFix timeFix defectDescription
991219120codecompile1 Mistyped "input_stream" instead of "istream"
9912192210codecompile1 forgot to include "fstream.h"
991219320codecompile1 attempted non-const operation on const reference
991219420codecompile1 attempted use of ^ operator on doubles
991219520codecompile1 mistyped input_stream instead of istream (again)
991219620compilecompile1 ? (forgot to put a description on the form)
991219750codecompile1 changed return type of main() unacceptably
991219850testtest36 Extreme difficulty dealing with ifstream-- eventually terminated
991219950testtest1 Processed EOF as a number (misuse of atof)
9912191080codetest1 forgot to take sqrt of partial result in standard deviation
9912191150codetest3 forgot to add error message in number_list::read_entry_from_stream
9912191220testtest111Forgot to update call to number_list::read_entry_from_stream in number_list::read_from_stream after adding a parameter.
Eiffel
991220120codecompile1 Forgot "then" on if-then block
991220220codecompile1 Numerous-- used "=" instead of ":=" for assignment
991220320codecompile3 couldn't figure out how to redefine make and call precursor
991220480codetest1 wasn't allowing for integer-style (no decimal) entries in data
991220580codetest1 wasn't incrementing indices in loops
991220680codetest1  Erroneously typed "-" instead of "/" for standard dev calc.