Assignment title: Management
SOC 402 - R Homework Assignment #2
Regression and Correlation
Due Date:
Week 8, Monday, February 27, 2017 by 5:00pm
Please submit your assignment to 6-23 Tory or 5-21 Tory
Electronic lab submissions will not be accepted (hard copy only)
Late submissions and submissions that do not follow directions will be penalized
Directions:
Use data from the NLSY97 Dataset (posted on the course website) to answer the following questions in a separate document.
All assignments must be typed. The write-up portion of the assignment should be double-spaced,
using 11 or 12 point font for the text.
Please write in essay format for each question, but also indicate which questions you are answering.
Answer the questions separately and number them, but write each answer in paragraph format.
Tables should be neat and organized. Graphs should be correctly labeled. When discussing variables, make sure to include the correct units. Finally, don't forget to interpret your results and
graphs.
Spelling, grammar, organization, and mechanics will be graded, so make sure to proofread.
Include your plots and your R Output below your answer for that question. You can do this by
copying and pasting the R Output from the Console into your Word document. Your R Output
does not have to be double spaced. For formatting purposes you should use \Courier" or \Courier
New" as your font and 11pt or less as your font size for your R Output.
Note:
Some questions will require answers in paragraph form and some will require tables or graphs.
Others will require only R Output. This should be obvious within the text of each question.
1SOC 402 - R Homework Assignment #2
Regression and Correlation
You're a researcher who is interested in work, gender, and health. You're planning on studying
the relationships that exist between these variables using the NLSY97 dataset and you're particularly interested in the factors that influence income. You begin your analysis by investigating the
bivariate relationship between income (wage.inc) and hours of work (hrs.work).
1. Create and interpret a scatterplot between hours of work and income. What type of relationship
might you be dealing with? Does the scatterplot indicate any potential problems in the data?
2. Calculate and interpret the correlation coefficient between hours of work and income.
3. Conduct a bivariate regression using hours of work to predict income. Interpret your results.
4. Create a second scatterplot between hours of work and income. Add a regression line this time.
Page 2 of 3SOC 402 - R Homework Assignment #2
5. What happens to this relationship when you control for education, measured in years of education (grade1)?
6. These first models helped you to learn about the relationship between hours of work and income.
However, you're really interested in income differences by gender. Using regression, assess the
bivariate relationship between gender and income in the data.
7. In addition to gender, you know that other factors matter for predicting income. You decide to
create a model that includes: gender, hours of work, education, and age as predictor variables.
Use multiple regression to assess this relationship. Make sure to interpret your results.
8. Extra Credit: This question is optional. If you answer the question correctly you can earn
extra points to make up for lost points on other questions, but you cannot earn more than
100% total on the assignment.
The previous analyses focus on estimating income with a variety of predictor variables. You're
also interested in how marital status and parenthood relate to income. Net of gender, hours of
work, education, and age, do married or single individuals earn more? Net of gender, hours of
work, education, and age, do individuals with or without children earn more? Use what you've
learned about regression to answer this question.
Page 3 of 3