Context

Emergency departments (EDs) serve as the critical interface between acute patient needs and hospital resources. In the United States alone, over 130 million ED visits occur annually, with EDs functioning as the entry point for more than half of all hospital admissions. The ability to accurately predict patient outcomes at the time of ED presentation has profound implications for clinical decision-making, resource allocation, and patient safety.

Three key outcomes are particularly relevant in the ED setting:

The temporal nature of these outcomes—when events occur, not just whether they occur—makes survival analysis a natural methodological framework. Unlike classification approaches that predict binary outcomes at fixed time points, survival methods explicitly model time-to-event data, handle censoring appropriately, and can incorporate time-varying risk assessment.

Objectives

The goal of this exam is to develop and evaluate a survival prediction framework for ED patient outcomes using the MIMIC-IV-ED database.

Xie et al. (2022) established benchmark models for ED outcome prediction using machine learning classification methods (logistic regression, random forest, gradient boosting, neural networks). Their work treated outcomes as binary classification tasks—predicting whether hospitalization, critical outcomes, or 72-hour revisits would occur.

However, a survival analysis framework offers several potential advantages:

Your task: Select one or more outcomes from the MIMIC-IV-ED dataset and develop an appropriate survival analysis approach. You should justify whether a survival framework provides advantages over the classification approach used by Xie et al., and demonstrate this empirically where possible.

Data

Data are derived from the MIMIC-IV-ED database (Medical Information Mart for Intensive Care - Emergency Department), which contains de-identified health records from the Emergency Department at Beth Israel Deaconess Medical Center, Boston, Massachusetts.

Population: The database includes 425,087 ED stays corresponding to 216,878 unique adult patients (≥18 years). Patients may have multiple ED visits, creating a natural clustering structure.

Data Access

Full dataset: MIMIC-IV-ED requires credentialed access through PhysioNet. The credentialing process involves:

  1. Complete CITI “Data or Specimens Only Research” training
  2. Create a PhysioNet account with institutional email
  3. Submit credentialing application (supervisor information required)
  4. Sign the Data Use Agreement
  5. Request access to MIMIC-IV-ED

Access: https://physionet.org/content/mimic-iv-ed/

Demo dataset: For immediate access without credentialing, a demonstration subset (100 patients) is freely available:

https://physionet.org/content/mimic-iv-ed-demo/

The demo dataset preserves the database structure and can be used for code development and methodology testing, though statistical conclusions will be limited.

Benchmark code: Data processing pipelines are available at:

https://github.com/nliulab/mimic4ed-benchmark

Methodology

Requirements

In your analysis, justify the following choices:

  • The choice of endpoint(s): Which outcome(s) will you model? How do you define the time origin, event, and censoring? Are there competing events to consider?

  • The survival model(s): What survival framework will you use? Why is this approach appropriate for your chosen endpoint?

  • Comparison with classification: How does your survival approach compare to the binary classification framework used by Xie et al.? What additional insights does the survival framework provide?

  • Model evaluation and validation: How will you assess model performance?

Deliverables

Please submit the following:

  • A slide deck focusing on methodology, experiments, and results. Do not include data description in your presentation; focus on the rationale behind your methodological choices.

  • 10-minute oral presentation (+5 minutes for questions). The focus should be on the rationale for your methodological approach, the experimental process, and key findings.

  • Code (Rmd or Jupyter notebook format) to be submitted by the morning of the exam date.

References

Johnson, Alistair, et al. “MIMIC-IV-ED” (version 2.2). PhysioNet (2023). RRID:SCR_007345. https://doi.org/10.13026/5ntk-km72

Xie, F., Zhou, J., Lee, J. W., Tan, M., Li, S., Rajnthern, L. S. O., … & Liu, N. (2022). Benchmarking emergency department prediction models with machine learning and public electronic health records. Scientific Data, 9(1), 658.