← Back to projects
Causal Inference

Propensity Score
Analysis

Does antihypertensive treatment reduce cardiovascular mortality? A comparative effectiveness study using propensity score matching, IPTW, and Cox models on 20 years of NHANES data.

NHANES 1999-2018 + NCHS Linked Mortality Files

18,129
Hypertensive adults
3,481
Matched pairs
1,641
CV deaths observed
7.8 yr
Median follow-up
THE PROBLEM

Observational data lies without adjustment

In claims and survey data, patients who receive treatment are systematically different from those who don't. They're older, sicker, and have more comorbidities. A naive comparison of outcomes will conflate treatment assignment with disease severity. In this cohort, the unadjusted hazard ratio is 2.86, suggesting antihypertensives increase cardiovascular death. That's confounding by indication, not a treatment effect.

Why this matters for HEOR
Every observational CER submission to payers, HTA bodies, or FDA must address this exact problem. Propensity scores are the standard toolkit. This project demonstrates the full workflow from scratch.
APPROACH

Three methods, one question

I built the entire propensity score pipeline from raw NHANES data to Cox models, implementing three complementary approaches to isolate the treatment effect.

PS Matching
1:1 nearest-neighbor
Caliper = 0.2 SD of logit PS. 3,481 matched pairs from 13,649 treated + 4,480 untreated. Creates a balanced pseudo-randomized sample.
IPTW
Inverse probability weighting
ATE weights (1/PS for treated, 1/(1-PS) for untreated). Trimmed at 99th percentile. Preserves the full sample, reweights to balance.
Python lifelines scikit-learn Cox PH Kaplan-Meier Propensity Scores IPTW Schoenfeld Residuals
FINDING 1

Strong confounding signal confirmed

The propensity score model achieves 81% accuracy, reflecting the systematic differences between treated and untreated groups. The distributions overlap but are clearly separated: treated patients cluster at higher PS values (median 0.87 vs 0.59).

Propensity score distribution
Propensity score distribution by treatment group
FINDING 2

Covariate balance achieved

The Love plot is the diagnostic that regulators and journal reviewers look for. Age, the strongest confounder (SMD 0.70 unadjusted), drops below 0.1 after both matching and IPTW. All covariates fall below the conventional 0.1 threshold.

Covariate balance Love plot
Standardized mean differences before and after adjustment
FINDING 3

Survival curves converge after matching

The unadjusted Kaplan-Meier shows dramatic separation (treated patients dying faster because they are sicker). After PS matching, the curves converge substantially. The remaining gap is small and may reflect residual confounding by unmeasured variables like medication adherence and specific drug classes.

Kaplan-Meier survival curves
KM curves for CV mortality: unadjusted (left) and PS-matched (right)
FINDING 4

HR 2.86 collapses to 1.02

Four Cox PH models show how the estimated treatment effect changes under increasing confounding control. The unadjusted HR of 2.86 is entirely driven by confounding. After IPTW, the HR is 1.02 with a wide confidence interval crossing 1.0.

Unadjusted
2.86 (2.47-3.32)
PS-Matched
1.24 (1.03-1.50)
IPTW
1.02 (0.87-1.21)
Doubly Robust
1.15 (0.98-1.36)
HR forest plot
Treatment effect on CV mortality across PS methods
The variation across methods is the finding
PS matching finds HR 1.24 (p=0.027). IPTW finds HR 1.02 (p=0.81). For a regulatory or HTA submission, presenting this range with transparent discussion is standard practice. No single observational method is definitive.
FINDING 5

Treatment effect varies over time

The proportional hazards assumption is borderline violated for treatment (p=0.035). The HR is higher in early follow-up and attenuates over time, consistent with the clinical expectation: sicker patients start treatment, but the protective effect of BP control accrues gradually over years.

PH assumption test
Treatment HR over follow-up time (rolling window)
FINDING 6

Consistent across subgroups

The subgroup forest plot tests whether the treatment effect varies by age, sex, diabetes, and smoking status. Consistent effect direction across subgroups strengthens the primary analysis and can inform targeted treatment strategies.

Subgroup forest plot
Subgroup analysis in PS-matched cohort
SO WHAT

What this means for HEOR

FindingImplication
Unadjusted HR reverses after PS adjustmentObservational analyses require rigorous confounding control for formulary and coverage decisions
HR varies 1.02-1.24 across methodsSensitivity analysis across PS approaches is non-negotiable for HEOR submissions
PH assumption borderline violatedTime-varying treatment effects should be explored; standard Cox may mask delayed benefit
3,481 pairs from 18,129Matching discards data; IPTW preserves the full sample and may be preferable when overlap is limited
Self-reported treatmentClaims-based identification (NDC codes, pharmacy fills) would strengthen real-world analyses
METHODS

How I built this

Merged 7 NHANES survey components per cycle across 10 cycles (1999-2018), plus fixed-width mortality linkage files with custom parser. Cohort restricted to adults 20+ with self-reported hypertension diagnosis. PS model: logistic regression on 12 covariates. Matching: greedy 1:1 nearest-neighbor on logit PS with 0.2 SD caliper. IPTW: ATE weights trimmed at 99th percentile (cap 15.4). Cox models fit with lifelines, robust standard errors for weighted analyses. All code from scratch with no black-box PS packages.

Propensity Scores 1:1 Matching IPTW Cox Regression Kaplan-Meier Schoenfeld Test Subgroup Analysis NHANES Survey Data
Tech Stack

Built with

Propensity score methods in R. Matching, IPTW, and doubly‑robust IPTW with sensitivity analyses across balance diagnostics, PH tests, and subgroup effects.

R MatchIt WeightIt survey survival cobalt tidyverse Quarto

Interested in causal inference, comparative effectiveness, or HEOR methods?

Get in touch