Assignment title: Information
CSI6110 Software Development Processes
CSI6110 Assignment Page 1 of 7 Mike Johnstone – 10/10/2016
ASSIGNMENT 2 – INDIVIDUAL AND GROUP ASSESSMENT
Assignment value: 35%
Due: Friday, 28th of October 2016, at 11:59pm, WST.
Related learning outcomes from the unit outline:
• Apply techniques to estimate software project costs, duration and effort.
• Identify software capability determination and evaluation techniques.
• Reflect on the need for measurement and software metric formulation and
evaluation
Background: The Personal Software Process (PSP), as espoused by Watts
Humphrey, attempts to focus on improving individual performance in the upper levels
of the CMM. The PSP recommends establishing metrics to measure aspects of the
software process and to test personal competency by writing programs measured
with these metrics.
You are required to create two small programs in Python. Similar to Assignment#1,
as part of the exercise you need to estimate how long you think it will take you to
write the program (after reading the brief). You must then keep accurate records of
two things: 1) how it takes you to write the program; and 2) any defects (bugs) that
you find and fix as part of the development.
For each of tasks #1 and #2, make your best estimate of 1) Program size (i.e. how
many lines of code you think you will write); and 2) Time to complete (i.e. the time it
will take you to write the program in hours or days, including time for defect
detection/removal). Note: It is essential that the estimates are calculated before you
start writing your programs.
Task#1 (individual-5%): Write a program to count (in LOC) the total program size of a
Python program and print out a count of the LOC. Consider whether a comment
counts as a "line". Justify your answer in your submission.
Task#2 (individual-15%): Read Appendix#1 for details on the programming task, the
tests required and some background on the techniques used (if you are not familiar
with them).
Task#3: (group-10%): In your group (assigned by your lecturer), pool your values for
time to complete (in person-days, for example and lines of code) for tasks#1 and #2.
Many predictive models, such as COCOMO, are of the general form:
Person-Months = a*KLOC^b
where "a" and "b" are defined constants and "*" and "^" are the usual multiplication
and exponentiation operators (in your model, you would replace person-months with
person-days or hours and KLOC with LOC as your programs are small).
When using this type of model, two questions arise. First, where do these constants
come from and second, could this model be used as-is or would it need to be
calibrated for each organisation?CSI6110 Software Development Processes
CSI6110 Assignment Page 2 of 7 Mike Johnstone – 10/10/2016
The constants come from fitting a curve (the above equation) to known data about
projects (their KLOC and PM results). Calibration is essential as the type of projects
that were used to derive the original values for the modes of COCOMO are not
necessarily in the same domain as the software development firm which wishes to
use the model.
A common method used to fit curves to data is least-squares regression. For a linear
model, the equation is often expressed as: y = β1*x + β0, where "β1" is the slope of
the line and "β0" is the point where the line crosses the y-axis (called the y-intercept).
Calculate constants for a predictive model, based on your pooled data. The
COCOMO equation will need to be transformed into a linear equation to use the
linear model above. Use: Log Person-days = Log a + b * Log LOC
Task#4 (individual-5%): Having determined a model that predicts time from LOC, use
your actual value for LOC from Assignment#1 to estimate the time to complete
Assignment#1. Is this estimate similar to the actual time taken in Assignment#1?
Discuss the difference or similarity.
Task#5: When complete submit your task deliverables, programs, any snapshots
and time sheets via BlackBoard (MyECU).
If you are not sure about any of the requirements for this assignment, please check
with your lecturer.
Important: By providing a submission, you are declaring that the submission is entirely your
own work, except where reference is made to the work of others (using the ECU referencing
standard, available at: http://www.ecu.edu.au/library/pdf/refguide.pdf). If this is found not to be
correct you will be subject to penalties ranging from loss of marks to exclusion from the
University.CSI6110 Software Development Processes
CSI6110 Assignment Page 3 of 7 Mike Johnstone – 10/10/2016
Appendix 1
Note: This information is copyright © Carnegie Mellon University. Used with permission.
Program 2
requirements
Write a program to:
• calculate the linear regression parameters β 0 and β1 and correlation
coefficients
rx, y and r 2 for a set of n pairs of data,
• given an estimate, xk calculate an improved prediction, yk where
yk = β0 + β1xk
Table 1 contains historical estimated and actual data for 10 programs. For
program 11, the developer has estimated a proxy size of 386 LOC.
Thoroughly test the program. At a minimum, run the following four test cases.
• Test 1: Calculate the regression parameters and correlation coefficients
between estimated proxy size and actual added and modified size in Table 1.
Calculate plan added and modified size given an estimated proxy size of xk =
386.
• Test 2: Calculate the regression parameters and correlation coefficients
between estimated proxy size and actual development time in Table 1.
Calculate time estimate given an estimated proxy size of xk = 386.
• Test 3: Calculate the regression parameters and correlation coefficients
between plan added and modified size and actual added and modified size in
Table 1. Calculate plan added and modified size given an estimated proxy
size of
xk = 386.
• Test 4: Calculate the regression parameters and correlation coefficients
between plan added and modified size and actual development time in Table
1. Calculate time estimate given an estimated proxy size of xk = 386.
Expected results are provided in Table 2.
Program
Number
Estimated
Proxy Size
Plan Added and
Modified size
Actual Added and
Modified Size
Actual
Development
Hours
1 130 163 186 15.0
2 650 765 699 69.9
3 99 141 132 6.5
4 150 166 272 22.4
5 128 137 291 28.4
6 302 355 331 65.9
7 95 136 199 19.4
8 945 1206 1890 198.7
9 368 433 788 38.8
10 961 1130 1601 138.2
Table 1CSI6110 Software Development Processes
CSI6110 Assignment Page 4 of 7 Mike Johnstone – 10/10/2016
Expected
results
Test Expected Values Actual Values
β 0 β1 rx, y r 2 yk β 0 β1 rx, y r 2 yk
Test 1 -22.55 1.7279 0.9545 0.9111 644.429
Test 2 -4.039 0.1681 0.9333 .8711 60.858
Test 3 -23.92 1.43097 .9631 .9276 528.4294
Test 4 -4.604 0.140164 .9480 .8988 49.4994
Table 2
Regression
Overview Linear regression is a way of optimally fitting a line to a set of data. The linear
regression line is the line where the distance from all points to that line is
minimised. The equation of a line can be written as
y = β0 + β1x
In Figure 1, the best fit regression line has parameters of β 0 = -4.0389 and β1 =
0.1681.
Figure 1
y = -4.0389 +0.1681x
0
50
100
150
200
250
0 200 400 600 800 1000 1200
Estimated Proxy Size
Actual Development HoursCSI6110 Software Development Processes
CSI6110 Assignment Page 5 of 7 Mike Johnstone – 10/10/2016
Correlation
Overview The correlation calculation determines the relationship between two sets of
numerical data.
The correlation
rx, y can range from +1 to -1.
• Results near +1 imply a strong positive relationship; when x increases, so does
y.
• Results near -1 imply a strong negative relationship; when x increases, y
decreases.
• Results near 0 imply no relationship.
Calculating regression and correlation
Calculating
regression and
correlation
The formulas for calculating the regression parameters β 0 and β1 are
( )
( 2 )
1
2
1
1
avg
n
i
i
avg avg
n
i
i i
x nx
x y nx y
−
−
=
∑
∑
=
β =
β0 = yavg − β1xavg
The formulas for calculating the correlation coefficient rx, y and r 2 are
−
−
−
=
∑ ∑ ∑ ∑
∑ ∑ ∑
= = = =
= = =
2
1 1
2
2
1 1
2
1 1 1
,
n
i
i
n
i
i
n
i
i
n
i
i
n
i
i
n
i
i
n
i
i i
x y
n x x n y y
n x y x y
r
r 2 = r * r
where
• Σ is the symbol for summation
• i is an index to the n numbers
• x and y are the two paired sets of data
• n is the number of items in each set x and y
•
xavg is the average of the x values
•
yavg is the average of the y valuesCSI6110 Software Development Processes
CSI6110 Assignment Page 6 of 7 Mike Johnstone – 10/10/2016
Appendix 2 - PSP Time Recording Log
Student
Date
Start Date and
Time
Stop Date and
Time
Delta
Time CommentsCSI6110 Software Development Processes
CSI6110 Assignment Page 7 of 7 Mike Johnstone – 10/10/2016
Appendix 3 – Defect Types
PSP Defect Type Standard
Type
Number Type Name Description
10 Documentation Comments, messages
20 Syntax Spelling, punctuation, typos, instruction formats
30 Build, Package Change management, library, version control
40 Assignment Declaration, duplicate names, scope, limits
50 Interface Procedure calls and references, I/O, user formats
60 Checking Error messages, inadequate checks
70 Data Structure, content
80 Function Logic, pointers, loops, recursion, computation, function
defects
90 System Configuration, timing, memory
100 Environment Design, compile, test, or other support system problems