CSE5DSS – Decision Support Systems Individual Assignment, 2017 Due Date: Monday 1st May, 10:00am, 2017 Assessment Weight: 30% of the final mark for the subject Instructions  This is an INDIVIDUAL assignment. You are not permitted to collaborate with any other student, and you are not permitted to outsource the work to any other party.  This assignment consists of six separate problems. You are required to solve all six problems.  Total marks available for the assignment is 70. Plagiarism Plagiarism is the submission of somebody else’s work in a manner that gives the impression that the work is your own. When submitting your assignment via the LMS, the following announcement will appear: Software will be used to assist in the detection of plagiarism. Students are referred to the section on ‘Academic Misconduct’ in the subject’s guideline available on LMS. Lateness Policy Penalties are applied to late assignments (5% of total possible marks for the task is deducted per day, accepted up to 5 days after the due date only). An assignment submitted more than five working days after the due date will not be accepted. Submission Procedure Assignments are to be submitted electronically via the Learning Management System. You should submit two files: (i) a single Microsoft Word file which contains your answers to all six problems; and (ii) a single Microsoft Excel file which contains your spreadsheet models for Problems 2 and 3, each on a separate and clearly labelled tab. Once again, please note that this is an individual assignment. You are not permitted to collaborate with any other student, and you are not permitted to outsource the work to any other party. Software will be used to assist in the detection of plagiarism.Problem 1: Decision analysis using decision tables (10 marks) George Keneally is a keen investor and likes to invest in the stock market. The return he expects from investing in the stock market will depend on the state of the market. He estimates that if the market is good he will get a 12% return; if the market is fair he will get a 5% return, and if the market is poor he will get a –2% return (i.e., a loss). Over the last few months George has been feeling more cautious, and is now considering whether he should instead invest his money in government bonds, which offer a fixed return of 6% per annum. George has $100,000 to invest, and wishes to invest it either all in stocks, or all in bonds. Decision tables are often an appropriate modelling technique to use when trying to find the best solution from a small number of alternatives. Develop a decision table for this problem and use it to answer the following questions: 1. What is the maximax decision; i.e., the decision George would make if he were optimistic? Make sure you explain why you gave your answer? 2. What is the maximin decision; i.e., the decision he would make if he were pessimistic? Again, justify your answer? 3. What decision would George make if he believed that each of the three states of the market were equally likely? You must show all calculations, and justify your answer based on these calculations. In fact, the probability of the three states of the market are not equal. Rather, the probability that the market will be good is 50%, the probability that it will be fair is 30% and the probability that it will be poor is 20%. 4. What decision would George make once he has been given this information? (Show all your calculations, and justify your answer). A friend of George has referred him to a consultant who is able to predict with certainty whether the market will be good, fair, or poor. The consultant would charge $2,000 for this information. 5. Should George pay the consultant? What is the most that George should be willing to pay for the consultant’s advice? (Show all calculations and explain clearly how you arrived at your answer). What to submit Submit a brief report presenting your answers and justifications to the above questions. Include the decision table in your report. Your submission will be marked according to the completeness and correctness of your response.Problem 2: Portfolio planning using optimization (10 marks) Gerry has just obtained a job in portfolio planning at a newly created investment company. His manager has given him the responsibility of investing $10 million, and he must maximise the expected return of the investment over the next year. He has four investment alternatives available to him. The expected return for each of these alternatives is given in the following table. Investment Type Expected return (%) Cash 3 Listed Property 5 Australian Bonds 7 Stocks 12 There are some additional constraints on how the funds can be invested:  a minimum of 25% of the funds is to be placed in cash;  the amount in stocks cannot be more than double the amount in bonds;  a maximum of 35% of the funds may be placed in stocks;  the combined amount in bonds and stocks cannot exceed the combined amount in cash and property;  all of the available $10 million must be invested; and  each investment must be in multiples of $10,000. Set this problem up as a linear programming model in Excel, and use your model to answer the following questions:  How should the $10 million should be invested?  What is the overall return (in dollars terms)?  What is the overall return as a percentage of the $10 million invested? How do your answers to the above questions change if the return from stocks is now expected to be only 5%? What to submit Submit the following:  a brief report describing how the funds should be invested, and the resulting returns in each of the above cases;  the spreadsheet containing your model and Solver settings (this spreadsheet should appear in a separate tab of the single EXCEL spreadsheet file that you submit for this assignment). Your submission will be marked according to the completeness and correctness of your response.Problem 3: Simulating a sales plan (15 marks) Joe is the manager of an electronics store that sells TVs, HiFis, computers, and various other electronic devices. For next month Joe is planning a promotion on a discontinued model of a popular tablet computer, which has been a good seller over the last few months. He plans to run the promotion for 10 days. Joe is able to purchase the tablets from the manufacturer for $350, and he will sell them to his customers for $600. Any tablets that have not been sold at the end of the promotion will be sold to another retailer for $250. Joe can only place one order with the manufacturer, and he must do this before the promotion begins. He doesn’t know exactly what the demand will be, and estimates that on any particular day the probability of selling no tablets will be 10%; the probability of selling one tablet will be 15%; the probability of selling two tablets will be 25%; the probability of selling three tablets will be 30%; the probability of selling four tablets will be 15%; and the probability of selling five tablets will be 5%. He believes that there is a zero probability of selling any more than 5 tablets on any one day. Obviously Joe would like to maximise his profit over the period of the promotion, and in order to do this he must order an appropriate number of tablets from the manufacturer. If he orders too few, he may not have a sufficient number to meet customer demand; if he orders too many, then his stock may exceed customer demand, and he will be forced to pass the tablets on to the other retailer. Create a simulation model in EXCEL to assist Joe in determining how many tablets he should order. Use your simulation model to calculate the average net profit Joe would make for various order quantities, and present your findings in a graph. (You do not need to try each possible order quantity; rather, consider incrementing order quantities in lots of, say, 5. But do simulate over a large range of order quantities; say, from 10 to 50). You should make sure that you perform enough trials to obtain a reliable estimate of the mean, but also a reasonable estimate of the spread in profits that result from some order quantity (i.e., for each order quantity calculate the standard deviation as well as the mean). Based on your results, what advice would you give Joe? Make sure that you comment not only the mean profit, but also the variability that arises from different order quantities. What to submit Submit the following:  a brief report containing your results from the simulation, and describing, on the basis of these results, the recommendation that you would make to Joe. Include relevant details, such as the number of simulations you averaged over for each order quantity.  the spreadsheet containing your simulation model (this spreadsheet should appear in a separate tab of the single EXCEL spreadsheet file that you submit for this assignment). Your submission will be marked according to the completeness and correctness of your response.Problem 4: Predicting Hospital Expenses using regression (10 marks) Hospitals are very expensive organisations to run, and the cost depends on many variables, two of which are the number of beds in the hospital, and the number of admissions. The table below shows data for 14 hospitals. Beds Admissions Total Expense (Millions) 504 24000 191 203 6450 36 458 14700 95 63 4350 23 315 23250 140 210 7950 68 323 11550 86 75 2700 18 53 1350 21 135 900 9 165 4200 32 98 2400 17 780 34500 236 615 23850 149 Use WEKA to create three regression models for predicting total expense.1  Model 1 should use only the number of beds as input  Model 2 should use only the number of admissions as input  Model 3 should predict total expense on the basis of both the number of beds and the number of admissions. For each model, record the regression equation, the training error, and the leave-one-out crossvalidation error. Use the regression equation from each model to predict the total expense of running a hospital with 350 beds and 20,000 admissions. Which model do you believe provides the most reliable prediction? You MUST justify your answer based on relevant data from the results that you have provided. What to submit Submit a brief report presenting your results and justifications. Your submission will be marked according to the completeness and correctness of your responses. 1 Use the Linear Regression function, which you will find in the Classify tab, in the Functions section. For Models 1 and 2, you will need to remove a predictor variable after you open the file. For Model 3, you will need to set the value of attributeSelectionMethod to ‘No attribute selection’. (You can get to this dialog box by left clicking on the words ‘Linear Regression’)Problem 5: Applying MLPs to the prediction of house prices (15 marks) The Housing dataset is a well-known dataset that is widely used for comparing the performance of data-mining and machine learning techniques on regression tasks. The dataset can be obtained from the UCI machine learning repository. The following URL will take you to UCI web page for this file: https://archive.ics.uci.edu/ml/datasets/Housing Read the documentation for this dataset, and then go to the Data Folder and download the file ‘housing.data’. Alternatively, you can download a .csv version of the data from the CSE5DSS LMS Page. Your task to carry out experiments to compare the performance of linear regression and multilayer perceptrons on predicting the value of homes. You should use the cross-validation test option, keeping the number of folds constant over each trial. (It is up to you to chose a suitable number of folds; e.g., 10). Perform the following: (i) Apply linear regression to this problem, using the default settings in WEKA. Record the root mean squared error. (ii) Now use an MLP with one hidden layer, containing what you believe to be a suitable number of hidden units in that hidden layer (various rules of thumb were described in the lectures). Vary the training time from 100 to 2000 in increments of 100, recording the mean squared error in each case. Plot a graph showing how the mean squared error varies with training time. (iii) Now try varying the number of units in the hidden layer of the multilayer perceptron, fixing the training time to that which resulted in the best performance in (ii) above. Use at least five different values for the number of hidden units. (Choose values over a significant range). Plot a graph showing how mean squared error varies with the number of hidden units. (iv) Based on your results from (ii) and (iii) above, try to find an MLP configuration (training time and number of hidden units) which you believe is close to optimal for this problem. What to submit Submit a brief report presenting your results from the above experiments. Include the two graphs in your report. Make sure you describe how the performance of the MLP depends on training time (i.e., what happens as the training time is increased?), and how it depends on the number of hidden units. What is the best configuration you could find for part (iv)? Your submission will be marked according to the completeness and correctness of your responses.Problem 6: Classifying credit risk (10 marks) The German Credit dataset is a well-known dataset that is widely used for comparing the performance of data-mining and machine learning techniques on classification tasks. The dataset can be obtained from the UCI machine learning repository. The following URL will take you to UCI web page for this file: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29 Read the documentation for this dataset, and then go to the Data Folder and download the file ‘german.data’. Alternatively, you can download a .csv version of the data from the CSE5DSS LMS Page. Answer the following preliminary questions: i. How many features or attributes does the data contain? ii. How many of the attributes are numeric? iii. How many of the attributes are categorical (including binary)? iv. How many examples does the data contain? v. Which attribute represents the class variable? vi. How many possible values can the class variable take? vii. What does each of the values of the class variable represent (i.e., good credit or bad credit)? Now load the file into WEKA and compare the performance of each of the following classifiers using 10-fold cross-validation:  J48 (this is the WEKA version of Quinlan’s C4.5)  Logistic Regression  Naïve Bayes  MLP (Use the default WEKA settings for each classifier.) Present the confusion matrix showing the results for each of the four classifiers, and for each case, calculate the accuracy, precision, and recall. (IMPORTANT: When presenting your confusion matrices, make sure that it is clear what is being represented in rows and columns; i.e., ‘actual’ classes, or ‘predicted’ classes). As described in the documentation for the dataset, the cost of misclassifications are not equal, and it is worse (in fact 5 times worse) to classify a customer as good when they are bad, than it is to classify a customer as bad when they are good. Using the results that you have provided above, calculate the weighted misclassification error for each of the classifiers, and, on the basis of these calculations, recommend which of the classifiers is the best to use on this dataset. Make sure that you show all calculations, and provide a clear justification for your answer. What to submit Submit a brief report presenting your answers to preliminary questions, your results and justifications. Your submission will be marked according to the completeness and correctness of your responses.