Emergency departments (EDs) serve as the critical interface between acute patient needs and hospital resources. In the United States alone, over 130 million ED visits occur annually, with EDs functioning as the entry point for more than half of all hospital admissions. The ability to accurately predict patient outcomes at the time of ED presentation has profound implications for clinical decision-making, resource allocation, and patient safety.
Three key outcomes are particularly relevant in the ED setting:
Hospitalization: Determining which patients require inpatient admission versus safe discharge is a fundamental ED decision. Approximately 15-20% of ED patients are admitted, but this decision is often made under time pressure with incomplete information. Accurate prediction models could support disposition decisions and improve patient flow.
Critical outcomes: A subset of ED patients will deteriorate rapidly, requiring intensive care unit (ICU) transfer or experiencing in-hospital mortality. Early identification of these high-risk patients—often within the first hours of presentation—enables proactive interventions and appropriate monitoring.
ED revisits: Unplanned return visits within 72 hours reflect potential gaps in initial care, premature discharge, or disease progression. This outcome serves as both a quality indicator and a patient safety metric.
The temporal nature of these outcomes—when events occur, not just whether they occur—makes survival analysis a natural methodological framework. Unlike classification approaches that predict binary outcomes at fixed time points, survival methods explicitly model time-to-event data, handle censoring appropriately, and can incorporate time-varying risk assessment.
The goal of this exam is to develop and evaluate a survival prediction framework for ED patient outcomes using the MIMIC-IV-ED database.
Xie et al. (2022) established benchmark models for ED outcome prediction using machine learning classification methods (logistic regression, random forest, gradient boosting, neural networks). Their work treated outcomes as binary classification tasks—predicting whether hospitalization, critical outcomes, or 72-hour revisits would occur.
However, a survival analysis framework offers several potential advantages:
Time-to-event modeling: Rather than predicting “will this patient be admitted?”, survival models can answer “when is this patient likely to require admission?” This temporal granularity supports operational planning.
Censoring: Patients who leave against medical advice, transfer to other facilities, or have incomplete follow-up are naturally handled through censoring mechanisms.
Competing risks: ED outcomes are not independent—a patient who dies cannot be readmitted; a patient transferred to ICU has a different risk trajectory than one discharged home. Survival frameworks explicitly model these competing events.
Dynamic prediction: Survival models can update risk estimates as new information (vital signs, lab results) becomes available during the ED stay.
Your task: Select one or more outcomes from the MIMIC-IV-ED dataset and develop an appropriate survival analysis approach. You should justify whether a survival framework provides advantages over the classification approach used by Xie et al., and demonstrate this empirically where possible.
Data are derived from the MIMIC-IV-ED database (Medical Information Mart for Intensive Care - Emergency Department), which contains de-identified health records from the Emergency Department at Beth Israel Deaconess Medical Center, Boston, Massachusetts.
Population: The database includes 425,087 ED stays corresponding to 216,878 unique adult patients (≥18 years). Patients may have multiple ED visits, creating a natural clustering structure.
Full dataset: MIMIC-IV-ED requires credentialed access through PhysioNet. The credentialing process involves:
Access: https://physionet.org/content/mimic-iv-ed/
Demo dataset: For immediate access without credentialing, a demonstration subset (100 patients) is freely available:
https://physionet.org/content/mimic-iv-ed-demo/
The demo dataset preserves the database structure and can be used for code development and methodology testing, though statistical conclusions will be limited.
Benchmark code: Data processing pipelines are available at:
In your analysis, justify the following choices:
The choice of endpoint(s): Which outcome(s) will you model? How do you define the time origin, event, and censoring? Are there competing events to consider?
The survival model(s): What survival framework will you use? Why is this approach appropriate for your chosen endpoint?
Comparison with classification: How does your survival approach compare to the binary classification framework used by Xie et al.? What additional insights does the survival framework provide?
Model evaluation and validation: How will you assess model performance?
Please submit the following:
A slide deck focusing on methodology, experiments, and results. Do not include data description in your presentation; focus on the rationale behind your methodological choices.
10-minute oral presentation (+5 minutes for questions). The focus should be on the rationale for your methodological approach, the experimental process, and key findings.
Code (Rmd or Jupyter notebook format) to be submitted by the morning of the exam date.
Johnson, Alistair, et al. “MIMIC-IV-ED” (version 2.2). PhysioNet (2023). RRID:SCR_007345. https://doi.org/10.13026/5ntk-km72
Xie, F., Zhou, J., Lee, J. W., Tan, M., Li, S., Rajnthern, L. S. O., … & Liu, N. (2022). Benchmarking emergency department prediction models with machine learning and public electronic health records. Scientific Data, 9(1), 658.