Unit 2: Exploring Two-Variable Data

In this unit, we will build on what we've learned by representing two-variable data, comparing distributions, describing relationships between variables, and using models to make predictions.

Explanatory vs. Response Variables

Example: Studying hours (x) vs. exam score (y)
→ Hours studied is explanatory; exam score is response.

Scatterplots

Used to display the relationship between two quantitative variables.

Each point represents one individual

Put explanatory variable on x-axis, response on y-axis

Describe overall pattern with FODS:

Example:

This scatterplot shows a strong, negative, linear association between age of drivers and number of accidents. There don't appear to be any outliers in the data.

Correlation (r)

Measures the strength and direction of a linear relationship between two variables.

💡 Tip: Correlation does not imply causation!

Least-Squares Regression Line (LSRL)

The line that minimizes the sum of squared residuals.

Equation form: \( \hat{y} = a + bx \)

Example: \( \hat{y} = 45 + 5x \)
Each additional hour studied predicts 5 more points. When \( x = 0 \), \( \hat{y} = 45 \).

Interpreting Computer Output

Example

Calculating the Regression Line by Calculator

  1. Enter data into Lists (STAT → Edit → L1, L2).
  2. STAT → CALC → LinReg(ax+b)
  3. Store RegEQ to Y₁: VARS → Y-VARS → Function → Y₁
  4. View scatterplot: 2nd → Y= → Turn on plot; ZoomStat

Interpreting Slope and Y-Intercept

Example: \( \hat{y} = 45 + 5x \)
Slope: Each additional hour studied predicts 5 more points on the test.
Intercept: A student who studies 0 hours is predicted to score 45.

Residuals

Residual = Actual y − Predicted y

\[ \text{Residual} = y - \hat{y} \]

Residual Plots

A scatterplot of residuals vs. x (or predicted y)

Understanding \( \hat{y} \), \( a \), \( b \), \( r^2 \), and \( s \)

Example: If \( r^2 = 0.82 \), then 82% of variation in test scores is explained by study hours. If \( s = 2.5 \), predictions are typically off by 2.5 points.

Outliers vs. Leverage vs. Influential Points

Coefficient of Determination (\( r^2 \))

\( r^2 \) tells us the proportion of variability in y explained by the linear relationship with x.

Example: \( r^2 = 0.85 \) → 85% of the variation in y is explained by the model.

Cautions in Regression

Calculator Tips and Hacks