Lesson 11a: In-Class Activity: Hypothesis Testing Simulation

Activity Overview

In this activity, you will use simulation to understand hypothesis testing and p-values.

Scenario:
A researcher wants to know whether a new intervention changes an outcome compared to a control group.

Step 1: Set Up the Data

library(tidyverse)

Warning: package 'dplyr' was built under R version 4.5.1

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

set.seed(123)

group <- rep(c("Control", "Intervention"), each = 50)
outcome <- rnorm(100, mean = 50, sd = 10)

df <- tibble(group, outcome)

df

# A tibble: 100 × 2
   group   outcome
   <chr>     <dbl>
 1 Control    44.4
 2 Control    47.7
 3 Control    65.6
 4 Control    50.7
 5 Control    51.3
 6 Control    67.2
 7 Control    54.6
 8 Control    37.3
 9 Control    43.1
10 Control    45.5
# ℹ 90 more rows

Step 2: Calculate Observed Difference

obs_diff <- df |>
  group_by(group) |>
  summarise(mean_val = mean(outcome)) |>
  summarise(diff = diff(mean_val)) |>
  pull(diff)

obs_diff

[1] 1.120047

Step 3: Simulate Under the Null

set.seed(123)

null_dist <- replicate(1000, {
  shuffled <- df |>
    mutate(group = sample(group))

  shuffled |>
    group_by(group) |>
    summarise(mean_val = mean(outcome)) |>
    summarise(diff = diff(mean_val)) |>
    pull(diff)
})

null_df <- tibble(diff = null_dist)

Step 4: Visualize

ggplot(null_df, aes(x = diff)) +
  geom_histogram(bins = 30) +
  geom_vline(xintercept = obs_diff, linetype = "dashed") +
  labs(
    title = "Null Distribution",
    x = "Difference in Means",
    y = "Frequency"
  )

Step 5: Compute p-value

p_value <- mean(abs(null_df$diff) >= abs(obs_diff))
p_value

[1] 0.56

Step 6: Interpretation Questions

Is the observed result extreme compared to the null?
Is the p-value small or large?
What does this say about evidence against the null?
Would you reject the null hypothesis?

Key Takeaway

A p-value tells us how surprising our data are under the null hypothesis.

Small p-value → strong evidence against the null
Large p-value → weak evidence against the null