Hand in the indicated questions. In preparation for the questions you hand in, it is worth your while to work through (or at least read through) the other questions as well. What you hand in needs to include (i) your code, (ii) the output that your code produced and (iii) your comments on the output as asked for in the questions. The easiest way to get this is to use an R Notebook and preview the results (to HTML or Word or PDF) when you are done. Hand in your work on Quercus. If you did STAC32 last fall, it’s the same procedure. A reminder is here: https://www.utsc.utoronto.ca/~butler/c32/quercus1.nb.html You are reminded that work handed in with your name on it must be entirely your own work. It is as if you have signed your name under it. If it was done wholly or partly by someone else, you have committed an academic offence, and you can expect to be asked to explain yourself. The same applies if you allow someone else to copy your work. The grader will be watching out for assignments that look suspiciously similar to each other (or to my solutions). Besides which, if you do not do your own assignments, you will do badly on the exams, because the struggle to figure things out for yourself is an important part of the learning process. Before you start, you’ll need this: library(tidyverse) ## — Attaching packages ———————————- tidyverse 1.2.1 — ## v ggplot2 3.1.0 v purrr 0.2.5 ## v tibble 1.4.2 v dplyr 0.7.8 ## v tidyr 0.8.1 v stringr 1.3.1 ## v readr 1.1.1 v forcats 0.3.0 ## — Conflicts ————————————- tidyverse_conflicts() — ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() 1. Work through, or at least read, problems 13.12, 13.13, 13.14, and 13.18 in PASIAS: http://ritsokiguess. site/pasias/ 2. At a certain small college, students take a test when they first enrol. Do the results of this test help to predict a student’s first-year grade point average? Some data was collected, in http://www.utsc. utoronto.ca/~butler/d29/grade-point-average.txt. There are two columns: first-year GPA and the score on the test. (a) (2 marks) Read in the data and display (the first ten lines of) what you have. (b) (3 marks) Draw a suitable plot of the data. Describe briefly what you see. (c) (2 marks) Fit a regression predicting first-year GPA from the test score, and display the results. (d) (2 marks) Is there a significant relationship between test score and first-year GPA? Does this surprise you, given what you saw in your graph? Explain briefly. (e) (3 marks) Obtain confidence intervals for the mean first-year GPA for students that have test scores 20 and 25.
(f) (2 marks) Which of your two confidence intervals is longer? Explain briefly why that is not surprising. 3. Work through, or at least read, chapter 16 of PASIAS. 4. Coronary heart disease affects many people. Is there an association between a person having significant evidence of coronary heart disease and the person’s age? 100 subjects were selected to participate in a study. For each person these four things were recorded: • an ID for that person (which we ignore) • the person’s actual age, to the nearest year • the “age group” in which the person falls • whether the person has “significant evidence of coronary heart disease” (“Yes”) or not (“No”). The data are at http://www.utsc.utoronto.ca/~butler/d29/chdage.csv as a CSV file. (a) (2 marks) Read in and display (at least some of) the data. To make the next part easier, call your data frame heart. (b) (2 marks) I wanted to have you plot the proportion of people in each age group that have significant symptoms against age group. This turns out to be a bit fiddly, but this code does it. Replace the initial data frame with whatever name you gave to the data frame you read in from the file (if you called it something different): heart %>% group_by(agegrp, chd) %>% summarize(n=n()) %>% spread(chd,n) %>% mutate(proportion=Yes/(Yes+No)) %>% ggplot(aes(x=agegrp, y=proportion))+geom_col() Run this code (by typing it or copy-pasting it). What does your graph tell you about how the likelihood of having symptoms depends on age? Explain briefly. (c) (2 marks) Fit a logistic regression predicting presence or absence of coronary heart disease from the (actual) age. Use the data frame you read in from the file, and display the results. (d) (3 marks) Is there a significant association between age and presence of significant symptoms of coronary heart disease? If there is, what kind of relationship is it? Explain briefly but carefully, using the output from this part only. (e) (3 marks) An alternative format for the same data is in http://www.utsc.utoronto.ca/~butler/ d29/chdage2.csv. Read the data