Prospective registries have emerged as a feasible way to capture real-world care across large patient populations. However, the proven validity of more robust and cumbersome patient-reported outcomes instruments (PROis) must be balanced with what is feasible to apply in large-scale registry efforts.

To determine the relative validity and responsiveness of common PROis in accurately determining effectiveness of lumbar fusion for degenerative lumbar spondylolisthesis in registry efforts.

Prospective cohort study.

Fifty-eight patients undergoing transforaminal lumbar interbody fusion (TLIF) for degenerative lumbar spondylolisthesis

Patient-reported outcome measures for pain (numeric rating scale for back and leg pain [NRS-BP, NRS-LP]), disability (Oswestry Disability Index [ODI]), general health (Short Form [SF]-12), quality of life (QOL) (EuroQol five dimensions [EQ-5D]), and depression (Zung depression scale [ZDS]) were assessed.

Fifty-eight patients undergoing primary TLIF for lumbar spondylolisthesis were entered into an institutional registry and prospectively followed for 2 years. Baseline and 2-year patient-reported outcomes were assessed. To assess the validity of PROis to discriminate between effective and noneffective improvements, receiver operating characteristic curves were generated for each outcomes instrument. An area under the curve (AUC) of ≥0.80 was considered an accurate discriminator. The difference between standardized response means (SRMs) in patients reporting meaningful improvement versus not was calculated to determine the relative responsiveness of each instrument.

For pain and disability, ODI had AUC=0.94, suggesting it as an accurate discriminator of meaningful improvement. Oswestry Disability Index was most responsive to postoperative improvement (SRM difference: 2.18), followed by NRS-BP and NRS-LP. For general health and QOL, SF-12 physical component score (AUC: 0.90), ZDS (AUC: 0.89), and SF-12 mental component score (AUC: 0.85) were all accurate discriminators of meaningful improvement, however, EQ-5D was most accurate (AUC: 0.97). EuroQol five dimensions was also most responsive (SRM difference: 2.83).

For pain and disability, ODI was the most valid and responsive measure of effectiveness of lumbar fusion. Numeric rating scale-BP and NRS-LP should not be used as substitutes for ODI in measuring effectiveness of care in registry efforts. For health-related QOL, EQ-5D was the most valid and responsive measure of improvement, however, SF-12 and ZDS are valid alternatives with less responsiveness.