Assignment title: Information
1. Data cleaning.You have received data from a hypothetical study of birthweight and the associated characteristics of the baby and the mother. The data set consists of 200 records containing information on the baby's sex and birthweight and the mother's smoking status, body mass index (bmi) and age. A complete description of the data is in the document assignment 3 data description. You should read this document before starting your data analysis.The data are in the comma-separated file birthweights data cleaning.csv.These data were collected on paper forms and transferred to a computer file before being given to you. Your task is to examine them for any invalid or inconsistent data and to prepare the data set for analysis. Report on your findings – what data were incorrect and what did you do with the incorrect data. (5 marks) 2. The relationship between baby's birthweight and the baby's sex and the mother's smoking status.You receive a second data set from the same hypothetical study of birthweight and the associated characteristics of the baby and the mother as described in question 1. However, these data are in a computer file and already prepared for analysis. The document assignment 3 data description also describes this data set.The data are in the comma-separated file birthweights analysis.csv.You wish to investigate the relationship between the baby's birthweight and the baby's sex and the mother's smoking status.2.1 The relationship between baby's sex and birthweight.Carry out a hypothesis test to see if mean birthweight is different for boy babies compared to girl babies, fully reporting on the test and its results. What do you conclude? (7 marks)2.2 The relationship between low birthweight babies and mother's smoking status.You wish to investigate the attributable fraction for risk of low birthweight and mother's smoking status, so you will need the risk difference. Calculate an estimate of the risk difference and its 95% confidence interval for the risk of low birthweight between mothers who are current smokers and mothers who are not current smokers. State these as a percentage to 1 decimal place. Test the hypothesis that this risk difference is zero, fully reporting on the test and its results. What do you conclude?