Biostatistics
Name
University
7th March 2017
Question 1 (4 marks)
Classify each variable in this data table as categorical or numeric (otherwise called continuous).
Patient number Sex (1=male, 2=female) Age (years) Self-reported smoking (cigarettes/day) Disability level (0=none, 1=mild, 2=moderate, 3=severe)
0 1 25 0 0
1 2 28 1 1
2 2 29 2 2
3 1 33 3 3
Solution
Classification Table
Variable Variable Type
Patient Number Numeric
Sex Categorical
Age Numeric
Self-reported smoking Numeric
Disability level Categorical
Question 2 (5 marks)
a. Using the ‘fham.p1.RData’ data set introduced in tutorial 3 and R Commander, tabulate the relationship between use of blood pressure medications at study entry (bpmeds) and the later occurrence of cardiovascular disease (cvd). (1 mark)
Solution
Frequency table:
bpmeds
cvd Not currently used Currently used
Did not occur 3163 74
Occurred 1066 70
a. Using row or column percentages describe the relationship between current use of blood pressure medications and history of cardiovascular disease. (2 marks)
Solution
Column percentages:
bpmeds
cvd Not currently used Currently used
Did not occur 74.8 51.4
Occurred 25.2 48.6
Total 100.0 100.0
Count 4229.0 144.0
Pearson's Chi-squared test
data: .Table
X-squared = 39.669, df = 1, p-value = 3.009e-10
From the above table, it is clear that 74.8% of those who currently don’t use blood pressure medications ( ) did not experience cardiovascular disease ( ) compared to the 51.4% of those who used and never experienced .
The Chi-Square tables further showed that there is significant association between and
b. Using conditional probabilities explain why current use of blood pressure medications and history of cardiovascular disease are not independent. (2 marks)
Solution In this case we test whether current use of blood pressure medications and history of cardiovascular disease are not independent. We let A to be current use of blood pressure medications and B = History of CVD.
Thus the probability of current use of given a history of is 93.84% (as compared to 97.71% among the patients without history of ); since the probabilities are different it shows that the current use of blood pressure medications and history of cardiovascular disease are not independent
Question 3 (4 marks)
a. Using the assignment data file allocated to you and R Commander, graph the relationship between MVPA and GPA. (1 mark)
Solution
Figure 1: Scatterplot of GPA versus MVPA
b. Describe in words the relationship between GPA and MVPA hours per week in this data set. (3 marks)
Solution
The above scatterplot shows that there exists a positive linear relationship between GPA and MVPA. That is to mean, an increase in the MVPA would result to an increase in the GPA.
Question 4 (7 marks)
a. Using the assignment data file allocated to you and R Commander, draw an appropriate graph of MVPA. (Don’t forget to provide meaningful labels on your axes). (1 mark)
Solution
Figure 2: A boxplot of MVPA
b. Using the graph alone, describe the centre, spread and shape of distribution of MVPA in these students. (Note: Don’t calculate any statistics yet – that is part c).) (3 marks)
Solution
Figure 3: Histogram of MVPA
The histogram shows that the data is skewed to the left (having a long tail to the left). The data are therefore not centred.
c. Use appropriate statistics, to summarise the distribution of MVPA. (Hint: consider measures of centre, spread and shape. Avoid cutting and pasting R commander output – write the answer in your own words.) (3 marks)
Solution
Using the summary function in R we obtained the mean of the MVPA as 3.22 with a standard deviation of 2.44 and interquartile range (IQR) being 3.2. The coefficient of variation together with the skewness and kurtosis were 0.76, 1.01 and 0.46 respectively.
Question 5 (3 marks)
It is estimated that 1 in 11 adults (whole world) has diabetes and that 1 in 2 of adults with diabetes are undiagnosed. A random sample of 200 Australian adults were tested for diabetes.
a. If Australian adults do not differ from the rest of the world, what is the probability that our random sample of 200 adults will contain 20 or fewer diabetics? (1 mark)
Solution
of them or fewer are diabetics, i.e.
We obtain np and check whether it meets the condition of np>10
> 10
Also find n(1-p)
> 10
We then obtain the Z-score as follows;
We then you find P(Z < 0.4477) using the tables and the value is;
b. If Australian adults do not differ from the rest of the world, we would predict 10% of all samples to contain fewer than how many diabetics? (1 mark)
Solution
We convert the p-value to Z-score
P(Z< -1.2808)=0.1
Thus 13 adults or less would predict a 10% of all samples
c. If Australian adults do not differ from the rest of the world, estimate the mean number of diabetics per sample? Show any working. (1 mark)
Solution
Question 6 (2 marks)
a. If the average age of retirement for the entire population in a country is 64 years and the distribution is normal with a standard deviation of 3.5 years, what is the approximate age range in which 95% of people retire? (1 mark)
Solution
In this case,
So two standard deviations is .
To find the lower end of the range, we have:
.
The upper end of the range is;
.
Thus the 95% of people who retire do so between the ages of about 57 to 71 years.
b. Last year’s graduates from an MPH degree, had a mean first-year income of $62,000 with a standard deviation of $8000. These first year salary levels are known to be normally distributed. What is the approximate percentage of first year graduates who made more than $80000? (1 mark)
Solution
We seek to find;
P(x>80,000)
So we obtain the Z-score
Thus the approximate percentage of first year graduates who made more than $80000 is 1.22%.
Question 7 (3 marks)
Sheila’s glucose level one hour after ingesting a sugary drink varies according to the Normal distribution with mean mg/dl and standard deviation mg/dl.
a. If a single glucose measurement is made, what is the probability that Sheila measures above 140 mg/dl? (1 mark)
Solution
P(x>140)
So we obtain the Z-score
Thus the probability that Sheila measures above 140 mg/dl is 0.0668.
b. Using the Central Limit Theorem, what is the probability that the sample mean from four separate measurements is above 140 mg/dl? Show any working. (2 marks)
Solution
Since the original distribution from which we sampled is normally distributed, the sampling distribution of averaging four numbers is exactly normal as well.
Thus;
Question 8 (2 marks)
Does this graph show a normal distribution or does it show a binomial distribution? Explain why.
Solution
The graph show a normal distribution this is because it is bell-shaped and as such the distributions are symmetric around their mean.