Unit 8: Inference for Categorical Data: Chi-Square

This unit focuses on performing inference for categorical variables using chi-square (\( \chi^2 \)) procedures. You'll learn to test for the distribution of a single categorical variable (goodness of fit), compare distributions across multiple groups (homogeneity), and assess association between two categorical variables (independence).

Chi-Square Test for Goodness of Fit

This test determines whether the distribution of a categorical variable matches an expected distribution.

Key Conditions

Random: Data should come from a random sample or randomized experiment.
Expected Counts: All expected counts must be at least 5.

Hypotheses

\( H_0 \): The categorical variable follows the specified distribution.
\( H_a \): The categorical variable does not follow the specified distribution.

Test Statistic

\[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \] where \( O_i \) = observed count, \( E_i \) = expected count.

Degrees of Freedom

\( df = \text{number of categories} - 1 \)

Example

A genetics researcher expects flower colors in a ratio of red:white:pink = 1:2:1. In a sample of 100 flowers, she observes 20 red, 50 white, and 30 pink. Is this consistent with the expected distribution?

Calculator Tip

Use STAT > TESTS > χ²GOF-Test on a TI-84.
Enter observed values in L1 and expected values in L2

Chi-Square Test for Homogeneity

Used to compare the distribution of a categorical variable across two or more populations or treatments.

Key Conditions

Random: Each group should be a random sample.
Independent Groups: The samples must be independent.
Expected Counts: All expected counts ≥ 5.

Hypotheses

\( H_0 \): The distribution of the categorical variable is the same for all groups.
\( H_a \): The categorical variable does not follow the specified distribution.

Test Statistic

\[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \]

Degrees of Freedom

\( df = (r - 1)(c - 1) \), where \( r \) = number of rows, \( c \) = number of columns.

Example

A researcher compares soda preferences across 3 regions. Do the distributions of preferences differ?

Calculator Tip

Use STAT > TESTS > χ²-Test.
Enter observed data in a matrix. Expected counts are calculated automatically.

Chi-Square Test for Independence

This test assesses whether two categorical variables are associated in a single population.

Key Conditions

Random: Data must come from a random sample.
Expected Counts: All expected counts ≥ 5.

Hypotheses

\( H_0 \): The two variables are independent.
\( H_a \): The two variables are not independent (i.e., they are associated).

Test Statistic

Same as other chi-square tests: \[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \]

Degrees of Freedom

\( df = (r - 1)(c - 1) \)

Example

A school collects data on gender and club membership. Is there a relationship between the two?

Calculator Tip

Use STAT > TESTS > χ²-Test with a two-way table.

Selecting an Appropriate Inference Procedure

To choose the right test:

Goodness of Fit: One categorical variable, testing one distribution.
Homogeneity: Comparing distributions of a variable across multiple groups.
Independence: Testing for an association between two variables in one group.

Tips

If your data is from multiple random samples, use the chi-square test for homogeneity.
If your data is from one sample and two variables, use the test for independence.
Always check conditions before conducting the test.

Summary Table

Test When to Use Hypotheses Degrees of Freedom
Goodness of Fit 1 variable, 1 sample Distribution matches expectation? \( k - 1 \)
Homogeneity 1 variable, 2+ groups Same distribution across groups? \( (r - 1)(c - 1) \)
Independence 2 variables, 1 sample Are the variables associated? \( (r - 1)(c - 1) \)

Expected Counts

Before running any chi-square test, you must check that all expected counts are at least 5. This ensures the sampling distribution of the test statistic is approximately chi-square.

How to Calculate Expected Counts:

Goodness of Fit: Use the expected proportions provided in the hypothesis. Multiply the proportion by the total sample size.
Example: If expecting 25% of 120 students to prefer chocolate, then \( E = 0.25 \times 120 = 30 \).

Homogeneity or Independence: Use the formula: \[ E = \frac{(\text{row total}) \times (\text{column total})}{\text{grand total}} \]

Expected Count Condition:

All expected counts must be ≥ 5. If this condition is not met, the chi-square test is not valid, and another method may be needed.

Calculator Hacks (TI-84):

Goodness of Fit Test

- Enter observed counts in L1 and expected counts in L2.
- Press STATEDIT to enter lists.
- Go to STAT TESTS → select χ²GOF-Test (often found in DISTR on some calculators).
- Set degrees of freedom: \( df = \text{number of categories} - 1 \).
- Run the test and interpret the test statistic and p-value.

Homogeneity or Independence Test (Using a Matrix)

- Go to 2ndx⁻¹ to access the Matrix menu.
- Use Edit to enter your observed values in a matrix (e.g., [A]).
- Go to STAT TESTS → select χ²-Test.
Choose the matrix for observed and leave expected blank (calculator will compute it).

Run the test. The calculator gives:
- \( \chi^2 \) statistic
- p-value
- Degrees of freedom (automatically calculated as \( (r - 1)(c - 1) \))

To see expected counts: Go to 2ndx⁻¹ and open matrix [B].

Tip:

If the p-value is small (typically less than 0.05), reject the null hypothesis. Always state your conclusion in context!

Practice Tip

Students often confuse the three tests. Ask yourself:

-How many variables?
-One sample or multiple?
-Are you testing distribution or association?

Make sure you use the right test and interpret the p-value in context of your hypotheses!