MIS770, T1 2017 Assignment Two 1 | P a g e FACULTY OF BUSINESS AND LAW Department of Information Systems and Business Analytics MIS770 – Foundation Skills in Data Analysis Assignment 2 –Automotive CO2 Emissions Analysis Particulars • Marks: 30% • Words: 2,000 • Submission: Online to the MIS770 assignment two drop box in CloudDeakin Email submissions will not be accepted • Note: This assignment is to be completed individually Assurance of Learning This assignment assesses the following Graduate Learning Outcomes and related Unit Learning Outcomes: Graduate Learning Outcome (GLO) Unit Learning Outcome (ULO) GLO4: Critical thinking: evaluating information using critical and analytical thinking and judgment ULO2: Manipulate and summarise data that accurately represents real world problems ULO3: Interpret and appraise statistical output to assist in real‐world decision making Overview The purpose of this assignment is to investigate a dataset which will enable you to answer questions posed in a Memorandum (see Memorandum section below). In order to answer the memorandum questions, you’ll need to analyse a given dataset, interpret the results, and then draw appropriate conclusions. The aims of the assignment are to:  provide you with some examples of the application of data analysis  test your understanding of the material in the relevant topics  test your ability to analyse and interpret your results  test your ability to effectively communicate the results of your analysis to others Before attempting the assignment, make sure you have studied the materials well. At a minimum, please read the relevant sections of the prescribed textbook and review the materials provided in Modules 1, 2 and 3 (simple regression).MIS770, T1 2017 Assignment Two 2 | P a g e Scenario You play the role of Mira Hetnal in the Ministry of Transport’s Research and Analysis Department and you have been asked to respond to a Memorandum from Selina Wang, the Chief Data Analyst. To assist you in answering Selina’s questions, she has provided you with a dataset called Motor_Vehicles.xlsx. For the purposes of this assignment, the dataset relates to a random sample of new Canadian Motor Vehicles whose CO2 Emissions were tested during 2015. The specific questions Selina has for Mira are in the following memorandum. Memorandum Memorandum Date: 4th January, 2017 To: Mira Hetnal, Research and Analysis Department From: Selina Wang, Chief Data Analyst Subject: Analysis of Automotive CO2 Emissions Data Dear Mira, Can you please carry out an analysis of the recent Automotive CO2 Emissions Data (contained in the file Motor_Vehicles.xlsx) and prepare a Memorandum reply to me containing answers to the following questions. In your Memorandum, please use plain language as I will provide your reply directly to people who do not necessarily understand statistical jargon. My specific questions are: Q1. An Overall View of CO2 Emissions Can you provide me with an overall summary of the variable CO2 Emissions just by itself? Q2. Relationships with CO2 Emissions Does there appear to be any relationship between the CO2 Emissions and the type of Fuel used? Q3. Confidence Intervals (a) Can you estimate the level of CO2 Emissions for all 4 Cylinder, 6 Cylinder and 8 Cylinder vehicles? Does there appear to be any difference? (b) Also, can you estimate the proportion of all vehicles that have 4 Cylinders, 6 Cylinders and 8 Cylinders? Q4. Hypothesis Tests Last month a national newspaper published an article stating the Federal Government was investigating a proposal to restrict CO2 Emissions for new vehicles to no more than 350 grams per kilometre. The same article suggested that this would remove at least 5% of the largest polluting vehicles off the road. Are you able to confirm if the sample data we have (i.e. Motor_Vehicles.xlsx) supports this claim? Q5. Simple Regression (a) I don’t know a lot about vehicle CO2 Emissions, but I would think that the larger an engine was, the greater the CO2 Emissions. Would you therefore create a regression model that shows how well the size of an engine (in litres) explains the variation in CO2 Emissions? (b) Would you then use your model to predict the CO2 Emissions of a vehicle that had an engine size of 1000 cc (i.e. 1 litre)? Do you have any concerns about this prediction?MIS770, T1 2017 Assignment Two 3 | P a g e Q6. Appropriate Sample Size Finally, I am concerned that the sample size of 1082 cars that we have in this study is far too many and we could easily get the same results with a much smaller sample. Therefore, if we wanted to undertake a future study, what would an appropriate sample size be if we wanted to: (a) estimate the proportion of vehicles whose CO2 Emissions were no more than 350 grams per kilometre to within 3% with a high level of confidence, and (b) accurately estimate the overall combined fuel consumption (i.e. Fuel_Both) to within 0.5 l/100kms. Basically, how many cars would we need to include in next year’s survey to satisfy both requirements? Regards, Selina Memorandum Requirements  Your Memorandum should be no longer than 2000 words and there is no need to include a Table of Contents, Charts and Tables, or Appendices in the Memorandum. The Charts/Graphics and Tables you create are only to be placed in the Data Analysis file i.e. the Excel spreadsheet  Suggested Word formatting for the Memorandum: Single‐line spacing; no smaller that 10‐ point font; page margins approx. 25mm, and good use of white space  Your Memorandum must have a cover sheet containing your particulars and Unit details  Set out the Memorandum in the same order as in the originating Memorandum from Selina, with each section (question) clearly marked  Use plain language and keep your explanations succinct. Avoid the use of technical or statistical jargon. As a guide to the meaning of “Plain Language”, imagine you are explaining your findings to a person without any statistical training (e.g. someone who has not studied this unit). What type of language would you use in this case?  Marks will be lost if you use unexplained technical terms, irrelevant material, or have poor presentation/organisation  All Microsoft Excel output associated with each question in the Memorandum is to be placed in the corresponding tab in the file Motor_Vehicles.xlsx Data Analysis Instructions/Guidelines To prepare a reply to Selina’s Memorandum, you will need to examine and analyse the dataset Motor_Vehicles.xlsx thoroughly. Selina has asked several questions and your Data Analysis output (i.e. your charts/tables/graphs) should be structured such that you answer each question on the separate tab/worksheet provided in your Excel document. There are also two extra tabs in Motor_Vehicles.xlsx called CI and HT and you can use the various templates contained in these tabs in your “Confidence Interval” and “Hypothesis” answers. In order to effectively answer the questions, your Data Analysis output needs to be appropriate. Accordingly, you’ll need to establish which of the following techniques are applicable for any given question:  Summary Measures (e.g. Descriptive Statistics, Inc. Outlier detection)  Comparative Summary Measures (i.e. Descriptive Statistics for multiple values of a variable)MIS770, T1 2017 Assignment Two 4 | P a g e  Suitable tables (such as a Frequency Distribution) and charts or graphics (such as Histograms, Box Plots, Pie Charts, Bar/Column Charts) that will illustrate more clearly, other important features of a variable  Cross Tabulations (sometimes called Contingency Tables), used to establish the relationships (dependencies) between two variables (see Additional Materials under Topic 3 – Creating Cross Tabulations in Excel using Pivot Tables)  Confidence Intervals. You can assume that a 95% confidence level is appropriate. We use Confidence Intervals when we have no idea about the population parameter we are investigating. Additionally, we would use Confidence Intervals if we are asked for an estimate. You can use the relevant Excel templates provided in the dataset and copy them to the applicable question tab  Hypothesis Tests. You can assume that a 5% level of significance is appropriate. We Use Hypothesis Tests when we are testing a Claim, a Theory or a Standard. You can use the relevant Excel templates provided in the dataset and copy them to the applicable question tab Note: There is an Appendix at the end of each Chapter of the Prescribed Textbook which describes the basic Excel steps associated with that Topic. Chapters 1 to 9 are applicable for this assessment. Submission Your completed assignment should be submitted in two separate files:  Data Analysis (Part A): An Excel document (i.e. the Motor_Vehicles.xlsx file) containing separate tabs/worksheets with charts/tables/graphs for each question. Please note that all interpretations should be presented in your “Memorandum” and the Excel document should only contain your intermediate analysis and final output  Memorandum (Part B): A Word document of no more than 2000 words which is not to contain any charts/tables/graphs Please name your Word document Motor_Vehicles_yourstudentid.docx and the Data Analysis file Motor_Vehicle_yourstudentid.xlsx. Note: The Cloud Unit site is the ONLY method of submission acceptable.MIS770, T1 2017 Assignment Two 5 | P a g e Marking Rubric Poor Needs Improvement Satisfactory Good Very Good Excellent Part A: Data Analysis (Marks: 12) This part relates to the various visualisations in the form of charts, tables & graphs etc. created by Mira which formed the basis of her response to Selina. 0 points Uses irrelevant or inappropriate techniques to analyse the data, or the Data Analysis and visualisation tools were used to analyse the data but in an incomplete or inaccurate manner. A very poor presentation of the analysis, or the analysis does not follow principles of good graphical display. 0 – 3.5 Marks 4 points Uses some appropriate data analysis and visualisation tools to analyse the data but there are many errors in the analysis. The presentation of the analysis needs improvement. 4 – 5.5 Marks 6 points Uses appropriate data analysis and visualisation tools to analyse the data but there are several errors in the analysis. The presentation of the analysis is satisfactory. 6 – 6.5 Marks 7 points Uses appropriate data analysis and visualisation tools to analyse the data but there are some errors in the analysis. The presentation of the analysis is of a respectable standard. 7 – 8 Marks 8.5 points Comprehensive analysis of the data using appropriate techniques, but there are some minor errors in the analysis. Uses data visualisations to understand the patterns in data. The analysis is well organised and follows principles of good graphical display. 8.5 – 9 Marks 12 points Skilful and comprehensive analysis of data using many different techniques. Uses data visualisations to produce novel insights. An excellent presentation of the analysis. 9.5 – 12 Marks Part B: Memorandum (Marks: 12) This part is the written response by Mira to the questions posed by Selina. 0 points Does not communicate any of the main findings of the analysis in an accurate and/or useful way, or the interpretation and communication of findings is at a basic level. The written communication is unprofessional or difficult to follow and contains numerous errors. 0 – 3.5 Marks 4 points Explains some of the main findings of the analysis accurately which only enables the reader to draw a few reasonable conclusions. The written communication is not very easy to follow and/or it contains too many errors. 4 – 5.5 Marks 6 points Explains most of the main findings of the analysis accurately and enables the reader to draw several reasonable conclusions. The written communication is clear and easy to follow but it contains minor errors. 6 – 6.5 Marks 7 points Explains nearly all of the main findings of the analysis accurately and enables the reader to draw mostly reasonable conclusions. The written communication is clear and easy to follow and generally free of errors. 7 – 8 Marks 8.5 points Provides detailed and accurate descriptions of the most important features of the analysis along with appropriately qualified conclusions. The written communication is professional, easy to follow and has a good structure. 8.5 – 9 Marks 12 points Provides outstanding descriptions and conclusions that are carefully considered and insightful. The written communication is very professional, logical and easy to follow. 9.5 – 12 Marks Overall Assignment Presentation (Marks: 6) 0 points No attempt has been made to follow the assignment Requirements/ Instructions/ Guidelines. Poorly presented 0 – 1.5 Marks 2 points Little attempt has been made to follow the assignment Requirements/ Instructions/ Guidelines. Unsatisfactorily presented 2 – 2.5 Marks 3 point Most of the assignment Requirements/ Instructions/ Guidelines have been followed. Satisfactorily presented 3 Mark 3.5 point Majority of the assignment Requirements/ Instructions/ Guidelines have been followed. Good presentation 3.5 – 4 Mark 4.5 points All of the assignment Requirements/ Instructions/ Guidelines have been followed. Very good presentation 4. 5 – 5 Marks 6 points All of the assignment Requirements/ Instructions/ Guidelines have been dealt with meticulously. Faultless presentation 5.5 – 6 Marks