Unit 6: Inference for Categorical Data: Proportions

This unit focuses on using sample data to draw conclusions about population proportions through confidence intervals and significance tests. You'll learn how to interpret and justify results using statistical reasoning and probability.

Constructing and Interpreting a Confidence Interval for a Population Proportion

A confidence interval estimates the true population proportion \( p \) using sample data.

Conditions:
Random: Data must come from a random sample or randomized experiment.
10% Condition: The sample size should be less than 10% of the population: \( n \leq 0.10N \).
Large Counts: Both \( np \geq 10 \) and \( n(1 - p) \geq 10 \) must be true.

Formula: The confidence interval is given by: \[ \hat{p} \pm z^* \sqrt{ \frac{\hat{p}(1 - \hat{p})}{n} } \] where \( \hat{p} \) is the sample proportion, \( z^* \) is the critical value based on the confidence level, and \( n \) is the sample size.

Interpretation: "We are C% confident that the true population proportion lies between [lower bound] and [upper bound]."

Tip: Use the 1-PropZInt function on your calculator (TI-84).

Example:

Survey: 128 of 200 students like Statistics → \(p̂ = 0.64\), \(n = 200\), for 95% CI, \(z^*=1.96\).
SE = \(\sqrt{0.64(0.36)/200} ≈ 0.034\).
CI = \(0.64 ± 1.96·0.034 = (0.574, 0.706)\).

Interpretation: “We are 95% confident that between 57.4% and 70.6% of students like Stats.”

Setting Up and Carrying Out a Test for a Population Proportion

Significance tests assess whether sample results provide convincing evidence against a claim about a population proportion.

Step 1: Hypotheses
Null Hypothesis: \( H_0: p = p_0 \)
Alternative Hypothesis: \( H_a: p < p_0 \), \( p > p_0 \), or \( p \ne p_0 \)
Step 2: Conditions (same as confidence intervals)
Step 3: Test Statistic \[ z = \frac{ \hat{p} - p_0 }{ \sqrt{ \frac{p_0(1 - p_0)}{n} } } \]
Step 4: P-value (use normal distribution to find area under curve)
Step 5: Conclusion (Compare p-value to \( \alpha \))

Tip: Use the 1-PropZTest function on your calculator.

Interpreting a P-value and Justifying a Claim

P-value: The probability, assuming \( H_0 \) is true, that the sample results are as extreme or more extreme than what was observed.
Low p-value (≤ α): Reject \( H_0 \). There is convincing evidence for \( H_a \).
High p-value (> α): Fail to reject \( H_0 \). There is not convincing evidence for \( H_a \).
Justification: Link your conclusion to context and whether evidence supports or does not support the claim.

Example: If p-value = 0.03 and α = 0.05, conclude "Since 0.03 < 0.05, we reject the null hypothesis. There is convincing evidence that..."

Hypothesis Test for a Single Proportion

State Hypotheses:
\(H_0: p = p_0\)
\(H_a: p <, \ne, \text{or } > p_0\)

Conditions: random, independent, and \(np_0\ge10\), \(n(1-p_0)\ge10\)
Test Statistic:

\[ z = \frac{p̂ - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}} \]

p‑value: Use normal table — area beyond |z| (two‑sided) or tail in direction of \(H_a\).
Decision: If p‑value ≤ α, reject \(H_0\); else fail to reject.
Conclusion: Contextualize in terms of original claim.

Example:
Claim: 60% of students like Stats → \(H_0: p=0.60\); Sample: 128/200 → \(p̂=0.64\).
Check: \(n p_0 = 200·0.60 = 120\ge10\); OK.
SE = \(\sqrt{0.60·0.40/200} ≈ 0.0346\).
z = \((0.64−0.60)/0.0346 ≈ 1.16\).
normalcdf(-1.16,1.16,0,1)= 0.751786
1-0.751786
Two‑tailed p ≈ 0.248 → fail to reject at α=0.05; insufficient evidence that proportion ≠ 0.60.

Type I and Type II Errors

Type I Error: Rejecting \( H_0 \) when it's actually true.
Type II Error: Failing to reject \( H_0 \) when it's actually false.
α (significance level): Probability of making a Type I error.
Power of a test: The probability of correctly rejecting \( H_0 \) (1 - probability of Type II error).

Tip: Think of courtroom analogy — Type I is convicting an innocent person, Type II is letting a guilty person go free.

Inference for the Difference of Two Proportions

This is used to compare two population proportions \( p_1 \) and \( p_2 \) using samples.

Conditions:
Both samples must be random and independent.
10% condition for both samples.
Large counts: \( n_1\hat{p}_1, n_1(1 - \hat{p}_1), n_2\hat{p}_2, n_2(1 - \hat{p}_2) \geq 10 \)

Confidence Interval: \[ (\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{ \frac{ \hat{p}_1(1 - \hat{p}_1) }{ n_1 } + \frac{ \hat{p}_2(1 - \hat{p}_2) }{ n_2 } } \] Significance Test (pooled):

Pooled proportion: \[ \hat{p}_{\text{pooled}} = \frac{ x_1 + x_2 }{ n_1 + n_2 } \]

Test statistic: \[ z = \frac{ (\hat{p}_1 - \hat{p}_2) }{ \sqrt{ \hat{p}_{\text{pooled}} (1 - \hat{p}_{\text{pooled}}) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } \]

Calculator Tip: Use 2-PropZInt for confidence intervals and 2-PropZTest for significance tests.

Example:

Sample 1: 60/150 report success → \(p̂_1 = 0.40\); Sample 2: 80/200 → \(p̂_2 = 0.40\).
Pooled: \((60+80)/(150+200) = 140/350 = 0.40\).
SE = \(\sqrt{0.40·0.60(1/150 + 1/200)} ≈ 0.049\).
z = \((0.40 - 0.40)/0.049 = 0\). p‑value = 1. Fail to reject → There isn't sufficient evidence to prove success.

Summary Table

Topic	Calculator Function	Key Formula
1-Proportion Confidence Interval	`1-PropZInt`	\( \hat{p} \pm z^* \sqrt{ \frac{ \hat{p}(1 - \hat{p}) }{ n } } \)
1-Proportion z-Test	`1-PropZTest`	\( z = \frac{ \hat{p} - p_0 }{ \sqrt{ \frac{p_0(1 - p_0)}{n} } } \)
2-Proportion Confidence Interval	`2-PropZInt`	\( (\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{ \frac{ \hat{p}_1(1 - \hat{p}_1) }{ n_1 } + \frac{ \hat{p}_2(1 - \hat{p}_2) }{ n_2 } } \)
2-Proportion z-Test	`2-PropZTest`	\( z = \frac{ (\hat{p}_1 - \hat{p}_2) }{ \sqrt{ \hat{p}_{\text{pooled}} (1 - \hat{p}_{\text{pooled}}) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } \)

Tips & Common Mistakes

-Always check conditions before computing unless the question states "All conditions have been met"
-Use pooled vs. unpooled SE correctly for tests vs. CIs.
-Confidence level \(C\) relates to \(z^*\). For example:
90%: \(z^* = 1.645\)
95%: \(z^* = 1.96\)
99%: \(z^* = 2.575\)

CI measures precision; hypothesis test assesses evidence against \(H_0\).
Don’t confuse margin of error (\(z^*·SE\)) with \(SE\) itself.