CSE5DSS – Decision Support Systems
Individual Assignment, 2017
Due Date: Monday 1st May, 10:00am, 2017
Assessment Weight: 30% of the final mark for the subject
Instructions
This is an INDIVIDUAL assignment. You are not permitted to collaborate with any other student,
and you are not permitted to outsource the work to any other party.
This assignment consists of six separate problems. You are required to solve all six problems.
Total marks available for the assignment is 70.
Plagiarism
Plagiarism is the submission of somebody else’s work in a manner that gives the impression that the
work is your own. When submitting your assignment via the LMS, the following announcement will
appear:
Software will be used to assist in the detection of plagiarism. Students are referred to the section on
‘Academic Misconduct’ in the subject’s guideline available on LMS.
Lateness Policy
Penalties are applied to late assignments (5% of total possible marks for the task is deducted per
day, accepted up to 5 days after the due date only). An assignment submitted more than five
working days after the due date will not be accepted.
Submission Procedure
Assignments are to be submitted electronically via the Learning Management System.
You should submit two files:
(i) a single Microsoft Word file which contains your answers to all six problems; and
(ii) a single Microsoft Excel file which contains your spreadsheet models for Problems 2 and 3,
each on a separate and clearly labelled tab.
Once again, please note that this is an individual assignment. You are not permitted to collaborate
with any other student, and you are not permitted to outsource the work to any other party.
Software will be used to assist in the detection of plagiarism.Problem 1: Decision analysis using decision tables (10 marks)
George Keneally is a keen investor and likes to invest in the stock market. The return he expects
from investing in the stock market will depend on the state of the market. He estimates that if the
market is good he will get a 12% return; if the market is fair he will get a 5% return, and if the market
is poor he will get a –2% return (i.e., a loss). Over the last few months George has been feeling more
cautious, and is now considering whether he should instead invest his money in government bonds,
which offer a fixed return of 6% per annum. George has $100,000 to invest, and wishes to invest it
either all in stocks, or all in bonds.
Decision tables are often an appropriate modelling technique to use when trying to find the best
solution from a small number of alternatives. Develop a decision table for this problem and use it to
answer the following questions:
1. What is the maximax decision; i.e., the decision George would make if he were optimistic?
Make sure you explain why you gave your answer?
2. What is the maximin decision; i.e., the decision he would make if he were pessimistic? Again,
justify your answer?
3. What decision would George make if he believed that each of the three states of the market
were equally likely? You must show all calculations, and justify your answer based on these
calculations.
In fact, the probability of the three states of the market are not equal. Rather, the probability that
the market will be good is 50%, the probability that it will be fair is 30% and the probability that it
will be poor is 20%.
4. What decision would George make once he has been given this information?
(Show all your calculations, and justify your answer).
A friend of George has referred him to a consultant who is able to predict with certainty whether the
market will be good, fair, or poor. The consultant would charge $2,000 for this information.
5. Should George pay the consultant? What is the most that George should be willing to pay for
the consultant’s advice? (Show all calculations and explain clearly how you arrived at your
answer).
What to submit
Submit a brief report presenting your answers and justifications to the above questions. Include the
decision table in your report. Your submission will be marked according to the completeness and
correctness of your response.Problem 2: Portfolio planning using optimization (10 marks)
Gerry has just obtained a job in portfolio planning at a newly created investment company. His
manager has given him the responsibility of investing $10 million, and he must maximise the
expected return of the investment over the next year. He has four investment alternatives available
to him. The expected return for each of these alternatives is given in the following table.
Investment Type Expected return (%)
Cash 3
Listed Property 5
Australian Bonds 7
Stocks 12
There are some additional constraints on how the funds can be invested:
a minimum of 25% of the funds is to be placed in cash;
the amount in stocks cannot be more than double the amount in bonds;
a maximum of 35% of the funds may be placed in stocks;
the combined amount in bonds and stocks cannot exceed the combined amount in cash and
property;
all of the available $10 million must be invested; and
each investment must be in multiples of $10,000.
Set this problem up as a linear programming model in Excel, and use your model to answer the
following questions:
How should the $10 million should be invested?
What is the overall return (in dollars terms)?
What is the overall return as a percentage of the $10 million invested?
How do your answers to the above questions change if the return from stocks is now expected to be
only 5%?
What to submit
Submit the following:
a brief report describing how the funds should be invested, and the resulting returns in each
of the above cases;
the spreadsheet containing your model and Solver settings (this spreadsheet should appear
in a separate tab of the single EXCEL spreadsheet file that you submit for this assignment).
Your submission will be marked according to the completeness and correctness of your response.Problem 3: Simulating a sales plan (15 marks)
Joe is the manager of an electronics store that sells TVs, HiFis, computers, and various other
electronic devices. For next month Joe is planning a promotion on a discontinued model of a popular
tablet computer, which has been a good seller over the last few months. He plans to run the
promotion for 10 days. Joe is able to purchase the tablets from the manufacturer for $350, and he
will sell them to his customers for $600. Any tablets that have not been sold at the end of the
promotion will be sold to another retailer for $250.
Joe can only place one order with the manufacturer, and he must do this before the promotion
begins. He doesn’t know exactly what the demand will be, and estimates that on any particular day
the probability of selling no tablets will be 10%; the probability of selling one tablet will be 15%; the
probability of selling two tablets will be 25%; the probability of selling three tablets will be 30%; the
probability of selling four tablets will be 15%; and the probability of selling five tablets will be 5%. He
believes that there is a zero probability of selling any more than 5 tablets on any one day.
Obviously Joe would like to maximise his profit over the period of the promotion, and in order to do
this he must order an appropriate number of tablets from the manufacturer. If he orders too few, he
may not have a sufficient number to meet customer demand; if he orders too many, then his stock
may exceed customer demand, and he will be forced to pass the tablets on to the other retailer.
Create a simulation model in EXCEL to assist Joe in determining how many tablets he should order.
Use your simulation model to calculate the average net profit Joe would make for various order
quantities, and present your findings in a graph. (You do not need to try each possible order
quantity; rather, consider incrementing order quantities in lots of, say, 5. But do simulate over a
large range of order quantities; say, from 10 to 50). You should make sure that you perform enough
trials to obtain a reliable estimate of the mean, but also a reasonable estimate of the spread in
profits that result from some order quantity (i.e., for each order quantity calculate the standard
deviation as well as the mean).
Based on your results, what advice would you give Joe? Make sure that you comment not only the
mean profit, but also the variability that arises from different order quantities.
What to submit
Submit the following:
a brief report containing your results from the simulation, and describing, on the basis of
these results, the recommendation that you would make to Joe. Include relevant details,
such as the number of simulations you averaged over for each order quantity.
the spreadsheet containing your simulation model (this spreadsheet should appear in a
separate tab of the single EXCEL spreadsheet file that you submit for this assignment).
Your submission will be marked according to the completeness and correctness of your response.Problem 4: Predicting Hospital Expenses using regression (10 marks)
Hospitals are very expensive organisations to run, and the cost depends on many variables, two of
which are the number of beds in the hospital, and the number of admissions. The table below shows
data for 14 hospitals.
Beds Admissions Total Expense (Millions)
504 24000 191
203 6450 36
458 14700 95
63 4350 23
315 23250 140
210 7950 68
323 11550 86
75 2700 18
53 1350 21
135 900 9
165 4200 32
98 2400 17
780 34500 236
615 23850 149
Use WEKA to create three regression models for predicting total expense.1
Model 1 should use only the number of beds as input
Model 2 should use only the number of admissions as input
Model 3 should predict total expense on the basis of both the number of beds and the
number of admissions.
For each model, record the regression equation, the training error, and the leave-one-out crossvalidation error.
Use the regression equation from each model to predict the total expense of running a hospital with
350 beds and 20,000 admissions.
Which model do you believe provides the most reliable prediction? You MUST justify your answer
based on relevant data from the results that you have provided.
What to submit
Submit a brief report presenting your results and justifications. Your submission will be marked
according to the completeness and correctness of your responses.
1 Use the Linear Regression function, which you will find in the Classify tab, in the Functions section.
For Models 1 and 2, you will need to remove a predictor variable after you open the file. For Model
3, you will need to set the value of attributeSelectionMethod to ‘No attribute selection’. (You can get
to this dialog box by left clicking on the words ‘Linear Regression’)Problem 5: Applying MLPs to the prediction of house prices (15 marks)
The Housing dataset is a well-known dataset that is widely used for comparing the performance of
data-mining and machine learning techniques on regression tasks. The dataset can be obtained from
the UCI machine learning repository. The following URL will take you to UCI web page for this file:
https://archive.ics.uci.edu/ml/datasets/Housing
Read the documentation for this dataset, and then go to the Data Folder and download the file
‘housing.data’. Alternatively, you can download a .csv version of the data from the CSE5DSS LMS
Page.
Your task to carry out experiments to compare the performance of linear regression and multilayer
perceptrons on predicting the value of homes. You should use the cross-validation test option,
keeping the number of folds constant over each trial. (It is up to you to chose a suitable number of
folds; e.g., 10).
Perform the following:
(i) Apply linear regression to this problem, using the default settings in WEKA. Record
the root mean squared error.
(ii) Now use an MLP with one hidden layer, containing what you believe to be a suitable
number of hidden units in that hidden layer (various rules of thumb were described
in the lectures). Vary the training time from 100 to 2000 in increments of 100,
recording the mean squared error in each case. Plot a graph showing how the mean
squared error varies with training time.
(iii) Now try varying the number of units in the hidden layer of the multilayer
perceptron, fixing the training time to that which resulted in the best performance
in (ii) above. Use at least five different values for the number of hidden units.
(Choose values over a significant range). Plot a graph showing how mean squared
error varies with the number of hidden units.
(iv) Based on your results from (ii) and (iii) above, try to find an MLP configuration
(training time and number of hidden units) which you believe is close to optimal for
this problem.
What to submit
Submit a brief report presenting your results from the above experiments. Include the two graphs in
your report. Make sure you describe how the performance of the MLP depends on training time (i.e.,
what happens as the training time is increased?), and how it depends on the number of hidden
units. What is the best configuration you could find for part (iv)?
Your submission will be marked according to the completeness and correctness of your responses.Problem 6: Classifying credit risk (10 marks)
The German Credit dataset is a well-known dataset that is widely used for comparing the
performance of data-mining and machine learning techniques on classification tasks. The dataset
can be obtained from the UCI machine learning repository. The following URL will take you to UCI
web page for this file:
https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
Read the documentation for this dataset, and then go to the Data Folder and download the file
‘german.data’. Alternatively, you can download a .csv version of the data from the CSE5DSS LMS
Page.
Answer the following preliminary questions:
i. How many features or attributes does the data contain?
ii. How many of the attributes are numeric?
iii. How many of the attributes are categorical (including binary)?
iv. How many examples does the data contain?
v. Which attribute represents the class variable?
vi. How many possible values can the class variable take?
vii. What does each of the values of the class variable represent (i.e., good credit or bad credit)?
Now load the file into WEKA and compare the performance of each of the following classifiers using
10-fold cross-validation:
J48 (this is the WEKA version of Quinlan’s C4.5)
Logistic Regression
Naïve Bayes
MLP
(Use the default WEKA settings for each classifier.)
Present the confusion matrix showing the results for each of the four classifiers, and for each case,
calculate the accuracy, precision, and recall. (IMPORTANT: When presenting your confusion
matrices, make sure that it is clear what is being represented in rows and columns; i.e., ‘actual’
classes, or ‘predicted’ classes).
As described in the documentation for the dataset, the cost of misclassifications are not equal, and it
is worse (in fact 5 times worse) to classify a customer as good when they are bad, than it is to classify
a customer as bad when they are good. Using the results that you have provided above, calculate
the weighted misclassification error for each of the classifiers, and, on the basis of these calculations,
recommend which of the classifiers is the best to use on this dataset. Make sure that you show all
calculations, and provide a clear justification for your answer.
What to submit
Submit a brief report presenting your answers to preliminary questions, your results and
justifications. Your submission will be marked according to the completeness and correctness of
your responses.