This unit focuses on using sample means to estimate population means and test hypotheses about them. It introduces t-distributions and inference for comparing two means.
Degrees of freedom (df) account for the fact that we estimate parameters (like the mean) from sample data, which limits how much the data can vary independently.
When calculating the sample mean \( \bar{x} \), you use one piece of information from the data.
Therefore, only \( n - 1 \) data values are free to vary independently when computing the sample standard deviation \( s \).
Because we use \( s \) (not \( \sigma \)) in inference, we must account for this uncertainty using the t-distribution — not the normal (z) distribution.
Has fatter tails than the normal distribution to reflect increased variability in small samples.
The shape of the t-distribution depends on the degrees of freedom (df):
Smaller \( df \) → wider, flatter distribution (more uncertainty)
Larger \( df \) → approaches the standard normal distribution
Procedure | Degrees of Freedom |
---|---|
One-sample t-test / t-interval | \( df = n - 1 \) |
Matched pairs t-test | \( df = n - 1 \) (where \( n \) = number of pairs) |
Two-sample t-test (conservative) | \( df = \min(n_1 - 1, n_2 - 1) \) |
Two-sample t-test (exact) | Calculated by software using a complex formula |
When population standard deviation is unknown (which is almost always), use a t-interval.
Conditions:
-Random sample
-10% condition if sampling without replacement: \(n \le 0.1N\)
-Nearly normal population or large sample size (Central Limit Theorem says \(n \ge 30\) is generally safe)
\[ \bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}} \]
Interpretation: “We are C% confident that the true population mean is between ___ and ___.”Sample: \(\bar{x} = 52\), \(s = 8\), \(n = 25\), 95% confidence
df = 24 → \(t^* ≈ 2.064\)
SE = \( \frac{8}{\sqrt{25}} = 1.6 \)
CI = \( 52 \pm 2.064(1.6) = (48.7, 55.3) \)
Hypotheses:
\(H_0: \mu = \mu_0\)
\(H_a: \mu < \mu_0\), \(\mu > \mu_0\), or \(\mu \ne \mu_0\)
Conditions:
-Random sample
-10% condition
-Nearly normal population or large \(n\)
Test statistic:
\[ t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} \]
Find the p-value using the t-distribution with \(df = n - 1\)
Compare p-value to significance level \( \alpha \) (typically 0.05)
Conclusion: Reject or fail to reject \(H_0\) in context.
Example:
\(\bar{x} = 52\), \(\mu_0 = 50\), \(s = 8\), \(n = 25\)
t = \( \frac{52 - 50}{8/\sqrt{25}} = \frac{2}{1.6} = 1.25 \)
df = 24, p-value ≈ 0.112
Since p > 0.05, fail to reject \(H_0\): insufficient evidence that mean differs from 50
p-value = probability of obtaining a result as extreme or more extreme than the one observed, assuming \(H_0\) is true
Low p-value (≤ α): sufficient evidence against \(H_0\), reject it
High p-value: fail to reject \(H_0\)
DO NOT SAY: "Accept \(H_0\)" or "p is the probability \(H_0\) is true"
Good conclusion: “There is (not) sufficient evidence to support the claim that the true mean is…”
Context: Comparing two independent groups (e.g., treatment vs. control)
Conditions:
Two independent random samples
10% condition for each
Each population is approximately normal or sample sizes \( \ge 30 \)
Confidence interval:
\[ (\bar{x}_1 - \bar{x}_2) \pm t^* \cdot \sqrt{ \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} } \]
Hypothesis test:
\(H_0: \mu_1 = \mu_2\) or \(\mu_1 - \mu_2 = 0\)
\(H_a: \mu_1 - \mu_2 <, >, \ne 0\)
Test statistic:
\[ t = \frac{(\bar{x}_1 - \bar{x}_2)}{ \sqrt{ \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} } } \]
Use conservative \(df = \min(n_1-1, n_2-1)\) or software-calculated degrees of freedom
Example:
Group 1: \(\bar{x}_1 = 55\), \(s_1 = 10\), \(n_1 = 30\)
Group 2: \(\bar{x}_2 = 50\), \(s_2 = 8\), \(n_2 = 35\)
SE = \(\sqrt{ \frac{100}{30} + \frac{64}{35} } ≈ 2.43\)
t = \( \frac{55 - 50}{2.43} ≈ 2.06 \)
df = 29 → find p-value and interpret
Matched Pairs: Use when each observation in one sample is paired with a related observation in the other sample (e.g., before/after measurements, twins, repeated trials).
Matched pairs are not two independent samples. Instead, treat the differences as a single sample of data.
Step 1: Calculate the difference for each pair: \(d = x_1 - x_2\)
Step 2: Find the mean and standard deviation of the differences: \(\bar{d}, s_d\)
Step 3: Use a one-sample t-procedure on the differences.
Stat > Tests
on the calculator: