Clinical ML: Why Interpretability Often Beats Accuracy
From the Heart Failure Readmission project
Context
In healthcare ML, the most accurate model is not always the most valuable. During my heart failure readmission work using the MIMIC-IV clinical database, I had to choose between a higher-scoring black-box model and a more interpretable, clinically-actionable model. I selected Logistic Regression over XGBoost, despite a 6% accuracy gap, because interpretability, clinical utility, and regulatory compliance (HIPAA/FDA guidance) dominated.
Decision Framework
- Statistical Performance (25%): Accuracy, precision, recall, F1, ROC-AUC
- Interpretability (30%): Clinicians must understand drivers of risk
- Regulatory Compliance (25%): Explainability for healthcare adoption
- Clinical Utility (20%): Actionable features aligned with medical knowledge
Weighted scoring favored Logistic Regression for clinical deployment, even though XGBoost led on pure metrics.
Technical Highlights
- Data: MIMIC-IV; ICD-10 comorbidities, vitals, labs, meds, procedures
- Feature Engineering: Comorbidity extraction, temporal vital trends, lab normalization
- Evaluation: Healthcare-specific metrics emphasizing sensitivity and interpretability
- Economics: Readmission cost avoidance, ROI for intervention programs, staffing optimization
Outcomes
- 15–20% improvement identifying high-risk patients
- 12% reduction in nursing-hour allocation errors
- $2–3M annual savings for mid-size hospital; ~300% 3-year ROI
Takeaways
- Clinical adoption requires interpretable models and aligned feature importance
- Regulatory readiness must be built-in, not retrofitted
- Economic validation is as critical as statistical performance
Read the full project and implementation details in the repository: heart_failure_readmission.