EPIDEMIOLOGY 340.600 : STATA PROGRAMMING AND DATA MANAGEMENT Assignment 3 Due date: 11:00 a .m., Wednesday , May 17 , 201 6 via CoursePlus dropbox Due to c onstraints of final grade submission to the registrar , late assignments will not be accepted Overview: write a .do file which performs the tasks described below. Your .do file should be called assignment3_ yourname .do (for example: assign ment1_massieallan.do ). Remember to write comments for full credit! Problem 1 is to write the name of t he log file correctly. The name of your log file should contain "a ssignment3_", your name, and the n the date on w hich the script is run. For example, if the script is run on May 14, 201 7 , then the log file for Allan Massie's assignment should be called assignment3_allanmassie_201 7 0 51 4 .log . ( For full credit, denote January – September as "01" - "09" and days 1 - 9 as "01 - "09". However, generous partial credit will be given if the 0s are missing. ) So the start of your .do file should contain: capture log close [code to create macros based on today's date and time] log using assignment3_[name]_[one or more macros].log, t e xt replace Note: this is a good way to automatically preserve earlier versions of a log file when you make changes to a script over time. After Assignment 3 is turned in we will release startlog.ado, which Allan Massie wrote to automatically incorporate the d ate into log file names. Problem 2a. Starting with an empty dataset, use rnormal() to create a dataset of 100 random numbers drawn from a normal distribution with a mean of 100 and standard deviation of 5. Calculate the mean and standard deviation of your 100 random numbers (the mean will be close to, but not exactly, 100; the variance will be close to, but not exactly, 5.) Problem 2b. Clear that dataset and create a dataset of 10,000 random numbers drawn from a normal distribution with a mean of 100 and st andard deviation of 5. Calculate the mean and standard deviation of your 10,000 random numbers. Problem 2c. Clear that dataset and create a dataset of 1,000,000 random numbers drawn from a normal distribution with a mean of 100 and standard deviation of 5. Calculate the mean and standard deviation of your 1,000,000 random numbers. Problem 2 evaluation: print the following table: Sample size Sample mean Sample standard deviation 100 [mean from 2a] [ SD from 2a] 10000 [mean from 2b ] [ SD from 2b ] 1000000 [mean from 2c ] [ SD from 2c ] Extra credit will be given if the bottom three lines are printed using a forvalues loop from 1 to 3, instead of writing code to print each line individually! Problem 3: Using donors.dta, r eproduce the following graph as precisely as possible: Note that the dots for female donors are pink, and the dots for male donors are blue. As per the subtitle, the scatter plot and regression line should be produced only for data from individuals with height > 150 cm, The command for the linear regression line for female height is regress don_wgt don_hgt if don_gender=="F" & don_hgt >= 150 For full credit, use results from the regress command for the regression line; do not use lfit . Export your graph to a .PNG file with the name q3_yourname.png (e.g. q3_massieallan.png) Extra credit challenge P roblem 6: write a program called sampmean to plot random data drawn from a normal distribution. sampmean takes a list of numbers representing different sample sizes. It also takes (optionally) a mean and standard deviation for the normal distribution. If we run sampmean, at(5 20 100 1000) mean(20) We get a graph like this: In this example, the prog ram generates four sets of normally distributed random numbers (one set of 5 numbers, one set of 20, one set of 100, and one set of 1000) and calculates the mean for each set. It also plots each randomly generated number (as points) and the group mean (as a red line). The group mean also appears above each group as text. Here are some more examples. The exact output will depend on the random number seed you use. sampmean, at (4 8 16 32 64) mean(5) sd(3) Mean=19.77 Mean=19.93 Mean=20.15 Mean=20.05 16 18 20 22 24 5 20 100 1000 sampmean, at(100 200 300) mean(5) sd(2) uniform In the last example, the distribution is a uniform distribution instead of a normal distribution. Hints:  You can use the keyword numlist (for a list of numbers) just as you do with a varlist (list of variables)  The uniform distribution from 0 to 1 has mean 0 .5 and standard deviation sqrt(1/12). If you are able to solve this problem, you can probably modify your program slightly so that it is also a solution for problem 2. That is fine. Note: this problem is pretty hard! We expect that few people will solve the whole thing, but we will give partial credit for a partial (working) solution. If your program only works partly, then explain in the comments, like this: //NOTE: my program does not display the means in the graph . Mean=3.81 Mean=6.18 Mean=5.52 Mean=4.69 Mean=5.22 -5 0 5 10 15 4 8 16 32 64 Mean=4.88 Mean=4.92 Mean=5.02 2 4 6 8 100 200 300 //instead it prints them to the scre en. //Also "uniform" doesn't work, //and my program runs for only one number (not a list)