Assignment title: Information


STATS 101/101G/108 Introduction to Statistics Assignment 2, Second Semester 2016 Due: 3pm Monday 12th September Read these instructions carefully Marks • Assignment 2 is worth 5% of your final mark. • It will be marked out of 45 marks, 40 marks for the questions and 5 marks for communication and presentation. See below for how communication and presentation marks are allocated. Your final mark will be converted to a mark out of 10 which will be recorded towards your course work. • Statistics is about summarising, analysing and communicating information. Communication is an important part of statistics. For this reason you will be expected to write answers which clearly communicate your thoughts. • Communication and Presentation marks • Demonstrate clear sentence structure: this includes correct use of full stops and capital letters; not writing overly long or complicated sentences; attention to spelling and grammar. • Demonstrate ability to communicate information clearly in sentences: this includes sentences clearly conveying the correct idea; sentences making sense; comments not being excessively long or short; conclusions following logically from previous statements. • Assignment tidily set out and easy to follow: this includes the answers being clearly set out in the correct order; the assignment not being messy; graphs and plots tidy with correct labelling of axes; the assignment (including the correct cover sheet) being clipped together or stapled. • Follow the "Step-by-Step Guide to Performing a Confidence Interval by Hand" as required. • Student ID number shown somewhere on assignment: this can be on the inside of the cover sheet or on the top of the first page of the assignment. Handing in • Hand in to the appropriate assignment drop-off box to the left of the counter in the Student Resource Centre, ground floor of building 301, by the plaza that connects buildings 301 and 303. Do not hand your assignment in to the unsecured assignment return boxes! • Assignments handed in to the wrong place or received after the due time will not be marked. Question guide • Attempt question 1 when chapter 3 has been covered. • Attempt question 2 when chapter 4 has been covered. • Attempt question 3 when chapter 5 has been covered. • Attempt questions 4 and 5 when chapter 6 has been covered. • Question 1 and 3 will require use of VIT. Hand in the required computer output. Notes • The format and handing in of Assignment 2 is the same as that for Assignment 1. Refer to the instructions on page 1 of Assignment 1. • Refer to the Worked Examples under Assignment Resources on Canvas for examples of how to set out your answers. • Refer to the Lecture Workbook, Section A (Course Information), page 3, Assignment Rules: Working together versus cheating. © Department of Statistics Question 1. [7 marks] [Chapter 3] In a recent test of the effectiveness of a new sleeping pill, 75 volunteers were randomly assigned to three groups of 25. The first group was given the new drug, the second group was given a placebo and the third group was given no treatment at all. Whether or not the volunteer managed to fall asleep in under 30 minute was recorded. The data for this study is stored in the file "Sleep.csv" which can be downloaded from Canvas. The data contains 2 variables: Treatment The treatment group the volunteer was assigned to (either Drug, Placebo or Neither). Sleep Whether the volunteer fell asleep within 30 minutes (either Yes or No). Download the VIT Guide for Randomisation Tests from Canvas. Use the Guide while working on this question. (a) Briefly explain why this study is an experiment. (b) (i) Run the iNZightVIT software and load the file Sleep.csv into it. Run a randomisation test to compare the proportion of volunteers who fell asleep within 30 minutes between the three treatment groups Include the output from this in your assignment answers. . Notes: Variable 1 needs to be Sleep and Variable 2 needs to be Treatment. Before you select "Record my Choice" in the Analyse window, change the level of interest to "Yes". (ii) When chance is acting alone, would it be unusual to get an average deviation from the overall proportion who fell asleep at least as big as the observed difference? (Use your randomisation test output to answer this.) (iii) Is it plausible that the observed average deviation from the overall proportion who fell asleep can be explained by "chance acting alone"? Briefly justify your answer. (iv) Can we conclude that different treatments caused differences in the sleep rate? If so, justify why with two reasons. If not, what can we conclude?Question 2. [7 marks] [Chapter 4] (a) Many research organisations give their interviewers exact scripts to follow when conducting interviews to measure opinions on controversial issues. (i) Give one type of bias which the research organisations are trying to minimise by using exact scripts for the interview. (ii) Will the use of scripts completely eliminate this form of bias? Briefly justify your answer. (b) In March 2001, Princeton Survey Research Associates conducted a poll asking about the 2000 presidential election in America. The poll asked a random sample of 1200 adults "Did you vote in the 2000 presidential election?" 72% said "yes" and 28% said "no" (with a margin of error of ±3 percent). The actual voter turnout was 51%. What non-sampling error is most likely to explain the difference in the polled and actual figures. Justify your answer. (c) In a U.S. court case, Bristol Myers was ordered by the Federal Trade Commission to stop advertising that "twice as many dentists use Ipana than any other toothpaste". Bristol Meyers had based their claim on a survey of 10,000 randomly selected subscribers to two dental magazines. They received 1,983 responses, with 621 saying they used Ipana and only 258 saying they used the second most popular brand. (i) Explain how selection bias may be a potential problem with the survey. (ii) Explain why self-selection bias is not a potential problem with the survey. (iii) Apart from selection bias, what is the main non-sampling error affecting the survey? Briefly justify your answer. (iv) Do the results of the survey provide convincing evidence of Bristol Myers claim? Briefly justify your answer. © Department of Statistics Question 3. [10 marks] [Chapter 5] How long do you have to wait for your coffee after placing an order? And does the waiting time depend on the gender of the person ordering the coffee? The following data was imputed from a study "Ladies first? A field study of discrimination in coffee shops." by Caitlin Myers. In this study, the times (in seconds) between ordering and receiving coffee for a random sample of customers in Boston-area coffee shops was recorded. The data are stored in the file "Coffee.csv" which can be downloaded from Cecil. The data contains 2 variables: Wait The wait time between placing the order and receiving the coffee (in seconds) Gender The gender of the customer (Male or Female) Download the VIT Guide for Bootstrapping from Canvas. Use the Guide while working on this question. Run the iNZightVIT software and load the file Coffee.csv into it. (a) (i) Why would the median be a better estimate of the centre of this data than the mean? (ii) Generate a bootstrap confidence interval for the median wait times of customers. (DO NOT use the variable Gender at this point.) Include the output in your assignment answers. (iii) What is the parameter we are estimating using this bootstrap confidence interval? (iv) Do we know the true value of this parameter? (v) Interpret the bootstrap confidence interval. (b) (i) Generate a bootstrap confidence interval for the difference in the median wait times between males and females. Include the output in your assignment answers. (ii) What is the parameter we are estimating using this bootstrap confidence interval? (iii) Interpret the bootstrap confidence interval. (iv) Based on the bootstrap confidence interval, is it believable that the median wait time for males is the same as the median wait time for females? Briefly justify your answer. Question 4. [7 marks] [Chapter 6] A statistics student was interested in investigating how long it takes to get a pizza delivered from the local pizzeria. Over a few weeks, a random sample of 10 delivery times (in minutes) was recorded. The data are displayed below: 18.5, 27.4, 19.8, 23.8, 23.4, 19.2, 28.4, 21.3, 17.9, 22.2 Summary Statistics: x = 22.19 minutes. And s = 3.61 (a) Calculate and interpret a 95% confidence interval for the mean delivery time. Note: You must clearly show that you have followed the "step-by-step guide to producing a confidence interval by hand" given in the Lecture Workbook, Chapter 6. Use the t-procedures tool to find values for t-multipliers and standard errors. (b) The student believes that, on average, pizzas from this pizzeria take less than 25 minutes to arrive. Does the data support this? Briefly justify your answer.Question 5. [9 marks] [Chapter 6] Southern Cross Travel Insurance commissioned a survey about New Zealander's travel in late 2014. Some of the results are shown below: Respondents who had travelled overseas before were asked what was the most annoying thing a nearby traveller can do on a flight. Results, classified by age, are: Annoyance Under 40 40+ Let children misbehave 149 208 Talk loudly/incessantly 89 183 Recline seat 75 119 Take over your armrest 41 55 Smell 234 251 Other reason 42 54 Not sure 58 75 Total 688 945 621 males and 938 females said they would take out travel insurance if they were to travel for a holiday. They were asked which of the following were reasons for taking out travel insurance. To cover cost of: Males Females lost luggage 492 795 emergency medical treatment 571 897 stolen items 480 750 car accidents 387 607 missed connections 397 634 cancelled activities 286 465 (a) State the sampling situation (a, b or c) for calculating the standard error of the difference in the following scenarios: (i) for people who would take out travel insurance, estimating the difference between the proportion of females who did so to cover the cost of lost luggage and the proportion of males who did so to cover the lost luggage. (ii) for people who would take out travel insurance, estimating the difference between the proportion of females who did so to cover the cost of car accidents and the proportion of females who did so to cover the cost of stolen items. (iii) for people who had travelled overseas before, estimating the difference between the proportion of people under 40 who thought nearby travellers letting children misbehave was most annoying and the proportion of people under 40 who thought nearby travellers reclining their seat was most annoying. (b) Consider people who had travelled overseas before. Calculate and interpret a 95% confidence interval for the difference between the proportion of people under 40 years of age who thought the most annoying thing a nearby traveller could do was smell and the proportion of people at least 40 years of age who thought the most annoying thing a nearby traveller could do was smell. Note: You must clearly show that you have followed the "step-by-step guide to producing a confidence interval by hand" given in the Lecture Workbook, Chapter 6. Use the t-procedures tool to find values for t-multipliers and standard errors. © Department of Statistics