Assignment title: Information


Regression Methods for Epidemiology • There are 7 questions in this assignment and you should answer all of them with a combination of your written responses and output from Stata. Include only the most relevant portion of Stata output. • You should compile your report as a Word or pdf document with sections of your Stata log files "pasted" in. Include Stata graphs if appropriate. • When pasting Stata output from a log file into a document change the font for that text to Courier New to get proper alignment of tables etc. Make sure the Stata output does not exceed the line length – try using a smaller font size, eg Courier New size 9, to ensure each line of output appears on a single line of your Word document. • This assignment has a total of 35 marks and counts 35% towards the overall mark for MPH5200. • Where a question states "Assess…" you are expected to provide your own interpretation of the results and accompany this with selected Stata output to show the results to which you refer. • Any questions regarding this assignment should be posted on the "Assignments" discussion list of the MPH5200 Moodle site. Factors related to the rate of growth of invasive melanoma A cross-sectional study of 269 people with an invasive melanoma was conducted with a view to identifying factors that may be related to the rate of growth of the melanoma. Pathology characteristics of each melanoma were obtained from a single laboratory and each patient completed a self-report questionnaire. In addition, full body counts of naevi (moles) had been conducted on approximately half of the participants in time-consuming clinical examinations. Figure 1. An example of a rapidly growing melanoma in a 57-year-old man that reached 15 mm in thickness in 8 weeks. The data for this study are in the file ROG.DTA which is available on the MPH2000 Blackboard site. Please contact me immediately if you have any trouble accessing the dataset. The variables in the dataset are as follows: ID Unique identifier for each study participant AGE Age in years GENDER 1=Male, 2=Female THICK 2=Thin (<1mm), 3=Intermediate (1-3.99mm), 4=Thick (4+mm) N_COUNT 1=50 or less naevi on the body, 2=>50 naevi on the body MITOTIC Mitotic rate, tumour proliferation markers per mm2 M4 0=<1/mm2, 1=1-4/mm2, 2=5-10/mm2, 3=>10/mm2 ROG Rate of growth, mm per month QUESTION 1 [3 Marks] While rate of growth, ROG, follows a markedly skewed distribution, its logarithm follows an approximately symmetrical distribution that is reasonably bell shaped. Demonstrate this using graphs and briefly discuss the implications for undertaking statistical analyses for rate of growth. [Hint: generate newlogrog=log(rog) ] QUESTION 2 [4 Marks] Is there an association between thickness and logarithm of rate of growth, "log ROG"? Construct a table of descriptive statistics for log ROG for melanoma in the three thickness categories and comment. Present a statistical analysis with your interpretations to assess the evidence for a "crude" (i.e. univariate, unadjusted) association between thickness and log ROG. QUESTION 3 [4 Marks] Is there an association between age and log ROG? Present a graphical summary and an appropriate descriptive statistic for this association and comment. Present a statistical analysis with your interpretations to assess the evidence of a univariate association between age and log ROG. QUESTION 4 [5 Marks] Is gender a confounder or effect modifier of the relationship between age and log ROG? Present results with your interpretations to address this question. QUESTION 5 [4 Marks] Assess the pros and cons of including mitotic rate as a continuous variable in a regression model for log ROG rather than as a four-category variable. QUESTION 6 [6 Marks] Fit a single regression model to adjust the association between age and log ROG for thickness and mitotic rate. Ignore the possibility of effect modification for this question. For your model, assess whether the regression assumptions are valid. QUESTION 7 [9 Marks] Full body counts of naevi (moles) had been conducted on approximately half of the participants. (1) [3 marks] Analyse the univariate association between naevi counts and log ROG and also the association when adjusting for age, thickness and mitotic rate. Compare the two findings and comment briefly the results. (2) [5 marks] Discuss sample size requirements and the results of conducting your own sample size calculations for detecting a difference of 0.4 in log ROG between high and low naevi counts. Consider a range of scenarios consistent with the data collected in the study. (3) [1 mark] In the light of your answers to parts (1) and (2) of this question, would conducting naevi counts for the remaining study participants be worthwhile?