Lesson 2.6 – Least Squares Regression Line
- Calculate the equation of the least-squares regression line using technology.
- Calculate the equation of the least-squares regression line using summary statistics.
- Describe how outliers affect the least-squares regression line.
We began today a little differently. We used fathom to examine why it is called the “least square” regression line. Fathom is a statistical software package that often provides a good visual for students (free trial here: http://fathom.concord.org). We use it only as an instructional tool (students don’t ever use Fathom).
We started with some data from a random sample of 6 students who gave us their GPA and ACT score (we will have to change to SAT for next year as Michigan just made the switch). We are hoping to be able to predict a student’s ACT score from their GPA. The students decided there would be a strong positive relationship before even seeing the data.
We then used Fathom to put a random line on top of the scatterplot. We reminded students about the residual from yesterday and showed them that the residual is really just the vertical distance from a point to the line of best fit. We also discussed that some of the residuals would be positive and some would be positive. So how can we make them all positive? Square them of course. Fathom allows you to show these squares and it also calculates the sum of the areas of the squares.
We ask students if this is the “best” line and most of them think there might be a better one. We let a student come to the computer and do their best to try to minimize the sum of the squares of the area. After they have made their best attempt, we add the “least squares” regression line.
The Activity for today uses the same data set as the Fathom file. After a quick review of slope and y-intercept from yesterday, the students are asked to introduce an outlier to the data set (that one kid who slacks off in school but does really well on standardized tests). We want students to notice that the outlier drags the line of best fit towards itself, thus decreasing the slope (less steep) and increasing the y-intercept. They can see this change by comparing the new slope and y-intercept to the originals.
Students often want to know what the rule is for determining an outlier on a scatterplot. We will not have a rule like we did for outliers in Chapter 1 for one-variable data (Q1 – 1.5*IQR and Q3+1.5*IQR). We tell them to simply look for values that don’t fall in the general trend of the rest of the data OR values that are outliers in the x or y direction (which we could actually use the outlier rules from Chapter 1 to test these ones). We then use the term “possible” outlier to describe it.