This unit focuses on performing inference for categorical variables using chi-square (\( \chi^2 \)) procedures. You'll learn to test for the distribution of a single categorical variable (goodness of fit), compare distributions across multiple groups (homogeneity), and assess association between two categorical variables (independence).
This test determines whether the distribution of a categorical variable matches an expected distribution.
Random: Data should come from a random sample or randomized experiment.
Expected Counts: All expected counts must be at least 5.
\( H_0 \): The categorical variable follows the specified distribution.
\( H_a \): The categorical variable does not follow the specified distribution.
\[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \] where \( O_i \) = observed count, \( E_i \) = expected count.
\( df = \text{number of categories} - 1 \)
A genetics researcher expects flower colors in a ratio of red:white:pink = 1:2:1. In a sample of 100 flowers, she observes 20 red, 50 white, and 30 pink. Is this consistent with the expected distribution?
Use STAT > TESTS > χ²GOF-Test
on a TI-84.
Enter observed values in L1
and expected values in L2
Used to compare the distribution of a categorical variable across two or more populations or treatments.
Random: Each group should be a random sample.
Independent Groups: The samples must be independent.
Expected Counts: All expected counts ≥ 5.
\( H_0 \): The distribution of the categorical variable is the same for all groups.
\( H_a \): The categorical variable does not follow the specified distribution.
\[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \]
\( df = (r - 1)(c - 1) \), where \( r \) = number of rows, \( c \) = number of columns.
A researcher compares soda preferences across 3 regions. Do the distributions of preferences differ?
Use STAT > TESTS > χ²-Test
.
Enter observed data in a matrix. Expected counts are calculated automatically.
This test assesses whether two categorical variables are associated in a single population.
Random: Data must come from a random sample.
Expected Counts: All expected counts ≥ 5.
\( H_0 \): The two variables are independent.
\( H_a \): The two variables are not independent (i.e., they are associated).
Same as other chi-square tests: \[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \]
\( df = (r - 1)(c - 1) \)
A school collects data on gender and club membership. Is there a relationship between the two?
Use STAT > TESTS > χ²-Test
with a two-way table.
To choose the right test:
Goodness of Fit: One categorical variable, testing one distribution.
Homogeneity: Comparing distributions of a variable across multiple groups.
Independence: Testing for an association between two variables in one group.
If your data is from multiple random samples, use the chi-square test for homogeneity.
If your data is from one sample and two variables, use the test for independence.
Always check conditions before conducting the test.
Test | When to Use | Hypotheses | Degrees of Freedom |
---|---|---|---|
Goodness of Fit | 1 variable, 1 sample | Distribution matches expectation? | \( k - 1 \) |
Homogeneity | 1 variable, 2+ groups | Same distribution across groups? | \( (r - 1)(c - 1) \) |
Independence | 2 variables, 1 sample | Are the variables associated? | \( (r - 1)(c - 1) \) |
Before running any chi-square test, you must check that all expected counts are at least 5. This ensures the sampling distribution of the test statistic is approximately chi-square.
Goodness of Fit: Use the expected proportions provided in the hypothesis. Multiply the proportion by the total sample size.
Example: If expecting 25% of 120 students to prefer chocolate, then \( E = 0.25 \times 120 = 30 \).
Homogeneity or Independence: Use the formula: \[ E = \frac{(\text{row total}) \times (\text{column total})}{\text{grand total}} \]
All expected counts must be ≥ 5. If this condition is not met, the chi-square test is not valid, and another method may be needed.
- Enter observed counts in L1
and expected counts in L2
.
- Press STAT
→ EDIT
to enter lists.
- Go to STAT TESTS
→ select χ²GOF-Test
(often found in DISTR
on some calculators).
- Set degrees of freedom: \( df = \text{number of categories} - 1 \).
- Run the test and interpret the test statistic and p-value.
- Go to 2nd
→ x⁻¹
to access the Matrix menu.
- Use Edit
to enter your observed values in a matrix (e.g., [A]).
- Go to STAT TESTS
→ select χ²-Test
.
Choose the matrix for observed and leave expected blank (calculator will compute it).
Run the test. The calculator gives:
- \( \chi^2 \) statistic
- p-value
- Degrees of freedom (automatically calculated as \( (r - 1)(c - 1) \))
To see expected counts: Go to 2nd
→ x⁻¹
and open matrix [B].
If the p-value is small (typically less than 0.05), reject the null hypothesis. Always state your conclusion in context!
Students often confuse the three tests. Ask yourself:
-How many variables?
-One sample or multiple?
-Are you testing distribution or association?
Make sure you use the right test and interpret the p-value in context of your hypotheses!