Unit 3: Collecting Data

Unit 3 of AP Statistics focuses on how to properly collect data to answer a statistical question of interest. This includes designing studies, understanding sampling methods, identifying sources of bias, planning and interpreting experiments, and more.

Planning a Study

Before collecting data, it's important to clearly define the objective and the population of interest. Key components of planning a study include:

Identify the population: The group you want to draw conclusions about.

Define the sample: The subset of the population you will collect data from.

Determine how data will be collected: Will it be through a survey, experiment, or observational study?

Tip: Always distinguish between a parameter (a value that describes the population) and a statistic (a value that describes the sample).

Sampling Methods

Choosing a good sample is critical to obtaining reliable results. A random sample helps reduce bias and allows for generalization to the population.

Simple Random Sample (SRS): Every individual and every group of individuals has an equal chance of being selected.
Pros: Unbiased, easy to analyze.
Cons: Can be impractical for large populations without a list of all members.

Stratified Random Sample: Divide the population into strata (groups of similar individuals), then take an SRS from each stratum.
Pros: Reduces variability, ensures representation of key subgroups.
Cons: Requires knowledge of strata beforehand.

Cluster Sample: Divide the population into clusters, randomly select some clusters, and include all individuals in the selected clusters.
Pros: Cost-effective, especially with geographically spread populations.
Cons: Higher variability if clusters differ significantly.

Systematic Sample: Select every \( n^{th} \) individual after a random starting point.
Pros: Quick and easy.
Cons: Can introduce bias if there's a pattern in the list.

Example: To survey students in a school, you could:

Use SRS: Randomly pick 50 students from the whole school.
Use stratified: Pick 10 students from each grade level.
Use cluster: Randomly pick 2 classes and survey all students in them.

How to Describe Sampling Methods (with Examples)

In free-response questions, you’ll often be asked to describe how to carry out a sampling method. To earn full credit, you must describe the how with sufficient detail and not just name the method. Here are the four most common methods with correct descriptions and examples.

1. Simple Random Sample (SRS)

How to describe: Assign a number to each individual in the population (e.g., 001 to 100), then use a random number generator to select individuals without replacement until the desired sample size is reached.

Example: Suppose a school has 500 students and you want to select a random sample of 50 students.

- Label each student with a unique number from 001 to 500.

- Use a random number generator to generate 50 unique numbers.

- Select the students who correspond to those numbers to form your sample, ignoring all repeating numbers.

2. Systematic Sample

How to describe: Randomly select a starting position, then sample every \( k^{\text{th}} \) individual thereafter.

Example: You want to sample 20 people from a list of 200 households.

- Divide 200 by 20 to get \( k = 10 \).

- Use a random number generator to pick a number between 1 and 10 (say it’s 7).

- Select every 10th person starting with the 7th (e.g., 7, 17, 27, 37, … until you have 20 households).

Important Tip: Make sure there is no periodic pattern in the list that could bias the result.

3. Stratified Random Sample

How to describe: Divide the population into non-overlapping subgroups (strata) based on a characteristic. Then take a separate SRS from each stratum.

Example: A researcher wants to sample students from a school, ensuring equal representation from each grade level (9th–12th).

- Divide the population into four strata based on grade: 9th, 10th, 11th, and 12th.

- From each grade, use a random number generator to select 25 students.

- Combine the selected students from all four grades into one sample of 100 students.

Why it works: It reduces variability and ensures representation from all important subgroups.

4. Cluster Sample

How to describe: Divide the population into clusters (usually based on geography or groups that already exist), randomly select a few clusters, and sample all individuals within the chosen clusters.

Example: A school has 100 homerooms, each with about 25 students. You want to sample around 100 students.

- List all 100 homerooms as clusters.

- Use a random number generator to select 4 homerooms.

- Survey every student in those 4 selected homerooms.

Key Difference from Stratified: In stratified, you randomly sample within every group. In cluster, you randomly select entire groups.

Sources of Bias in Sampling

Bias occurs when a sampling method systematically favors certain outcomes. Common sources of bias include:

Undercoverage: Some groups are left out of the sampling frame.

Nonresponse: Selected individuals cannot be contacted or refuse to participate.

Response bias: Individuals give inaccurate responses due to wording, interviewer influence, or social desirability.

Voluntary response bias: People with strong opinions are more likely to respond (e.g., online polls).

Convenience sampling: Choosing individuals who are easiest to reach.

Tip: Random sampling minimizes bias; bad sampling leads to misleading conclusions.

Types of Studies

Observational Study: Observes individuals and measures variables of interest but does not attempt to influence responses.

Experiment: Deliberately imposes a treatment to measure a response.

Key Difference: Only experiments can establish causation if properly designed with random assignment.

Designing an Experiment

Experiments are used to determine cause-and-effect relationships. Key principles of experimental design:

Comparison: Compare two or more treatments.

Random Assignment: Use chance to assign subjects to treatments to create roughly equivalent groups.

Control: Keep other variables constant to isolate treatment effects.

Replication: Use enough subjects to reduce variability.

Types of experimental designs:

Completely randomized design: All subjects are randomly assigned to treatment groups.

Randomized block design: Subjects are first grouped into blocks (e.g., by age or gender), then randomly assigned to treatments within each block.

Matched pairs design: Each subject receives both treatments (in random order), or subjects are paired and randomly assigned to treatments.

Tip: Random assignment allows for causal conclusions. Random sampling allows for generalizing to a population.

Blinding and Placebos

Blinding reduces bias due to expectations. Types include:

Single-Blind: Either the subject or the experimenter does not know which treatment is being given.

Double-Blind: Neither the subject nor the experimenter knows the treatment assignment.

Placebo: A fake treatment that mimics the real one; used to measure the placebo effect (response due to belief, not treatment).

Purpose: Prevent conscious or unconscious influence on the subject’s behavior or the evaluator’s measurement.

Interpreting the Results of an Experiment

After conducting an experiment, analyze whether differences in outcomes are statistically significant and determine if causality can be established.

Statistical significance: An observed effect so large that it would rarely occur by chance.

Causation: Can only be inferred when random assignment is used.

Confounding variables: Variables not controlled for that could influence results, making it unclear which variable caused the effect.

Example: In a drug trial, subjects are randomly assigned to receive either Drug A or a placebo. If the treatment group significantly improves compared to the control group, and random assignment was used, then it’s valid to infer causation.

Summary: Key Differences

Random Sampling: Allows generalizing to the population but not causation.

Random Assignment: Allows causation but not generalization unless sample is random.

Pro Tip: Use this table to remember the difference:

	Can Generalize?	Can Establish Causation?
Observational Study with Random Sample	Yes	No
Experiment with Random Assignment	No	Yes
Experiment with Random Assignment and Random Sample	Yes	Yes