Project 3: Modeling Health Outcomes Using Regression & Simulation
NHANES
Project Overview
In this project, you will move from descriptive analysis into statistical modeling and inference using the NHANES dataset. You will select a health outcome and model its association with key predictors using regression tools from ModernDive and interpretation frameworks from OpenIntro Biostatistics.
Your goal is to answer a meaningful public-health question such as:
- How is hypertension risk patterned by age, BMI, smoking, and income?
- Do depression symptoms vary by socioeconomic status?
- How does BMI differ across demographic groups?
- Which factors predict fair/poor self-rated health?
Your focus is not only on building models, but also on explaining what they mean in accessible and responsible language.
This project prepares you for your final manuscript, where modeling results will form the core of the Results section.
Dataset
You will continue working with the NHANES dataset from the {NHANES} R package. You may build on your Project 2 analytic dataset or refine your variables further.
Learning Objectives
By completing this project, you will be able to:
- fit regression models in R
- interpret model coefficients in context
- evaluate model fit and limitations
- perform simulation-based inference
- communicate uncertainty responsibly
- present results in a manuscript-ready format
Project Tasks
Your Quarto report must include the sections below.
1. Research Question
Write 1 to 2 paragraphs that clearly state:
- your outcome variable
- your primary predictors of interest
- your public-health motivation
- your hypothesis, if applicable
Example
We examine whether hypertension prevalence differs by age, BMI, and income. We hypothesize that hypertension risk increases with age and BMI, and that risk may differ by socioeconomic status.
2. Data and Methods
Briefly summarize:
- your analytic sample
- inclusion and exclusion criteria
- variables and coding decisions
- missing-data handling
- analysis approach
You should fit at least:
- one linear regression model, or
- one logistic regression model, depending on your outcome
Optional additions are encouraged:
- multiple regression
- an interaction term
- model comparison
3. Model Results
Include:
- model summary tables generated in R
- key coefficients interpreted in plain language
- confidence intervals or bootstrap-based intervals
- at least one visualization of modeled relationships
Examples of acceptable visualizations
- predicted probabilities across age
- a regression line with a confidence band
- grouped means with confidence interval bars
4. Inference and Uncertainty
Discuss:
- what your results suggest
- what they do not prove
- uncertainty and sampling variability
- potential confounding
- causal limits
This section should sound like applied epidemiology.
5. Public-Health Interpretation
Explain:
- equity implications
- population-health meaning
- which findings matter most
- reasonable next steps
Avoid overstating conclusions.
6. Reflection
Briefly discuss:
- what was challenging
- what you learned
- how modeling changed your understanding of the question
Formatting Requirements
Your submission must:
- be written fully in Quarto
- include inline code wherever possible
- knit successfully
- follow professional writing standards
- avoid jargon where possible
- be approximately 1,200 to 1,800 words, excluding code
Submission
Submit:
- the knitted report as a word doc
- the
.qmdsource file - the 3MT slide
Ethics Note
NHANES includes variables tied to inequity and lived experience. Please write with sensitivity and respect.
Project 3 Rubric
1. Model Specification and Execution — 25 points
Assesses correctness and appropriateness of modeling.
- 22–25: Model(s) correctly specified; predictors appropriate; code accurate; workflow clear
- 17–21: Minor mistakes, but overall valid
- 12–16: Major misunderstandings or mis-specification
- 0–11: Modeling incomplete or incorrect
2. Statistical Interpretation — 20 points
Evaluates how well results are explained.
- 17–20: Coefficients interpreted accurately in context; uncertainty clearly described
- 13–16: Mostly correct with small errors
- 8–12: Limited or partially incorrect interpretation
- 0–7: Interpretation largely incorrect or missing
3. Presentation of Results — 15 points
Assesses tables, figures, and clarity.
- 13–15: Professional, clear tables and figures; helpful formatting; at least one modeled visualization
- 10–12: Mostly clear with minor weaknesses
- 6–9: Limited clarity or missing elements
- 0–5: Poor or missing presentation
4. Public-Health Insight and Critical Thinking — 15 points
Evaluates meaning-making.
- 13–15: Findings thoughtfully connected to health outcomes, inequity, and context
- 10–12: Some meaningful interpretation
- 6–9: Minimal depth
- 0–5: No meaningful interpretation
5. Transparency and Documentation — 10 points
Focus on reproducible workflow.
- 9–10: Coding choices clearly explained; methods understandable; rationale provided
- 7–8: Mostly clear
- 4–6: Some ambiguity
- 0–3: Opaque workflow
6. Writing Quality and Organization — 10 points
Evaluates readability and structure.
- 9–10: Clear, polished, professional writing
- 7–8: Mostly clear with minor issues
- 4–6: Some disorganization or unclear writing
- 0–3: Difficult to follow
7. Quarto and Reproducibility — 5 points
Rewards strong scientific reporting.
- 5: Fully reproducible, cleanly structured, knits without error
- 3–4: Minor issues
- 1–2: Reproducibility inconsistent
- 0: Not reproducible