Lesson 10: Confidence Intervals & Uncertainty

Interval Logic, Interpretation, and Communication

Overview

Confidence intervals are one of the most important tools in statistical inference. They allow us to move beyond a single estimate and instead communicate a range of plausible values for a population parameter.

In this lesson, we focus less on calculation and more on interpretation, logic, and communication, which are essential skills in public health and data science.

Learning Objectives

By the end of this lesson, students will be able to:

Explain the logic behind confidence intervals
Interpret confidence intervals in plain language
Distinguish correct vs incorrect interpretations
Communicate uncertainty effectively in public health contexts
Connect confidence intervals to hypothesis testing concepts

Assigned Readings

OpenIntro Biostatistics, Chapter 4
Statistical Inference via Data Science, Chapter 9: Hypothesis Testing

What Is a Confidence Interval?

A confidence interval (CI) provides a range of plausible values for a population parameter.

Instead of reporting:

“The mean BMI is 27.3”

we report:

“The mean BMI is 27.3 (95% CI: 26.5 to 28.1)”

Interval Logic

Confidence intervals are built on three key components:

Estimate (sample statistic)
Standard Error (variability)
Critical Value (confidence level)

Mathematically:

\[ \text{Estimate} \pm \text{Critical Value} \times SE \]

Why Intervals Matter

A single number does not show uncertainty.

Confidence intervals:

show precision of estimates
reflect sampling variability
provide context for decision-making

Visualizing Uncertainty

library(tidyverse)

Warning: package 'dplyr' was built under R version 4.5.1

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

set.seed(123)

# Simulate many confidence intervals
true_mean <- 50
population <- rnorm(10000, mean = true_mean, sd = 10)

ci_sim <- replicate(100, {
  samp <- sample(population, 50)
  m <- mean(samp)
  se <- sd(samp)/sqrt(50)
  lower <- m - 1.96 * se
  upper <- m + 1.96 * se
  c(lower, upper)
})

ci_df <- as.data.frame(t(ci_sim))
names(ci_df) <- c("lower", "upper")
ci_df$id <- 1:nrow(ci_df)
ci_df$contains_true <- ci_df$lower <= true_mean & ci_df$upper >= true_mean

head(ci_df)

     lower    upper id contains_true
1 47.29454 53.13681  1          TRUE
2 46.60280 52.71860  2          TRUE
3 47.78533 52.12986  3          TRUE
4 45.76855 51.46971  4          TRUE
5 49.58307 55.63141  5          TRUE
6 46.82376 52.55915  6          TRUE

Plotting Confidence Intervals

ggplot(ci_df, aes(x = id, ymin = lower, ymax = upper, color = contains_true)) +
  geom_linerange() +
  geom_hline(yintercept = true_mean, linetype = "dashed") +
  labs(
    title = "Simulated 95% Confidence Intervals",
    x = "Sample",
    y = "Confidence Interval"
  ) +
  theme_minimal()

Key Insight

Most intervals contain the true mean
Some do not (about 5% for a 95% CI)

This is the meaning of “95% confidence”

Correct Interpretation

A correct interpretation is:

We are 95% confident that the true population parameter lies within this interval.

What “95% Confidence” Really Means

If we repeated the sampling process many times:

95% of constructed intervals would contain the true parameter
5% would miss it

Common Incorrect Interpretations

❌ “There is a 95% probability the true value is in this interval”
❌ “95% of individuals fall within this range”
❌ “The parameter changes from sample to sample”

Interpreting Instead of Calculating

In practice, you will often be given confidence intervals rather than calculating them.

Example:

Odds Ratio = 1.5 (95% CI: 1.2 to 1.9)

Interpretation:

The exposure is associated with higher odds of the outcome
The interval does not include 1 → statistically significant
The plausible effect size ranges from 1.2 to 1.9

Width of Confidence Intervals

The width of a confidence interval depends on:

Sample size (larger → narrower)
Variability (higher → wider)
Confidence level (higher → wider)

Demonstration: Sample Size Effect

set.seed(456)

get_ci_width <- function(n) {
  samp <- sample(population, n)
  se <- sd(samp)/sqrt(n)
  width <- (1.96 * se) * 2
  width
}

tibble(
  sample_size = c(50, 200, 500),
  ci_width = c(get_ci_width(50), get_ci_width(200), get_ci_width(500))
)

# A tibble: 3 × 2
  sample_size ci_width
        <dbl>    <dbl>
1          50     5.99
2         200     2.81
3         500     1.83

Communicating Uncertainty

In public health, it is essential to communicate uncertainty clearly.

Instead of saying:

“The intervention works”

Say:

“The intervention is associated with improved outcomes (95% CI: X to Y), suggesting a likely positive effect, though uncertainty remains.”

Public Health Example

Suppose:

Screening rate = 72%
95% CI = 68% to 76%

Interpretation:

The true screening rate is likely between 68% and 76%
The estimate is reasonably precise
Policy decisions should consider this range

Confidence Intervals & Hypothesis Testing

Confidence intervals are closely related to hypothesis testing.

Key connection:

If the CI excludes the null value → statistically significant
If the CI includes the null value → not statistically significant

Examples:

Measure	Null Value
Mean Difference	0
Risk Difference	0
Odds Ratio	1
Risk Ratio	1

Example Interpretation

If:

Risk Ratio = 1.3 (95% CI: 0.9 to 1.8)

Then:

CI includes 1 → not statistically significant
Evidence is inconclusive

Key Terms

Term	Definition
Confidence Interval	Range of plausible values
Margin of Error	Distance from estimate to bounds
Precision	Narrowness of interval
Statistical Significance	CI excludes null value
Null Value	Value representing no effect

Practice Activity

Given a confidence interval, interpret it in plain language
Determine whether it is statistically significant
Identify whether the interval is wide or narrow
Explain what that implies about precision

Reflection Questions

Why are confidence intervals more informative than single estimates?
What does it mean when a confidence interval is wide?
How does sample size affect confidence intervals?
Why is correct interpretation important in public health?

Common Mistakes to Avoid

Treating CI as probability of parameter
Ignoring width of interval
Overstating certainty
Misinterpreting statistical significance

Conclusion

Confidence intervals provide a powerful way to communicate uncertainty.

They allow us to:

quantify variability
interpret results meaningfully
connect estimation to decision-making

Understanding how to interpret and communicate confidence intervals is essential for evidence-based public health practice.

Looking Ahead

Next lesson:

Hypothesis testing in depth
p-values
Type I and Type II errors
Decision-making frameworks