Birth Data Analysis

Building an R × tidyverse × ggplot2 pipeline to model U.S. births (2016 – 2021)

Data Analysis
Data Modeling
Author

Raameen Ahmed, Muskan Ahmad, Malik Mustafa

Published

April 10, 2025

How maternal education shapes fertility patterns and infant health across the United States (2016 – 2021).

We stitched together 6 years of provisional NCHS records (≈ 20 M births), cleaned 1.2 GB of raw CSVs into a single tidy table, and fed it through a fully reproducible R × tidyverse workflow. The result is an interactive report that quantifies regional disparities, isolates key predictors, and pin-points policy levers for healthier outcomes.


♡ Research questions

  1. Education → birth outcomes
    How does maternal schooling predict birth weight and maternal age across states and years?

  2. Education → family size
    Are mothers with lower formal education more likely to have multiple children than highly-educated mothers?


♡ Functions & methods

Stage Key tools & notes
Cleaning / prep readr::read_csv() · dplyr::mutate() / as_tibble() · na.omit()
Descriptive stats group_by() + summarise() for year-by-state aggregates
Inferential stats One-way ANOVA (aov) + linear / interaction models (lm)
Classification Logistic glm predicting High_EDU (Bachelor’s +) from age & birth-weight
Visualisation ggplot2 bar / violin / scatter + facet_wrap() and geom_text_repel()

♡ Code & data


♡ Headline findings

  • Education inversely tracks fertility.
    High-school-only and some college mothers account for most births, while PhD / MD holders delay childbirth and have fewer children overall.

  • Higher education ⇒ healthier babies.
    Each jump in schooling adds ≈ 85 g to mean birth weight (p < 0.001).

  • Age amplifies the effect.
    In the interaction model, every extra year of maternal age adds +41 g for PhD mothers but only +1 g for high-school drop-outs.

  • Regional split.
    New England & the West host the most college-educated, older mothers; the Deep South clusters at younger ages & lower schooling.


♡ Policy take-aways

  • Education is a lever.
    Keeping women in school correlates with later, healthier births.

  • Target by geography.
    Southern states stand to gain most from prenatal-care subsidies and health-literacy programs.

  • Next steps.
    Add paternal education and local healthcare spend to explain the remaining variance.


♡ Full report (PDF)