Article, Critical Care

Prognostic utilization of models based on the APACHE II, APACHE IV, and SAPS II scores for predicting in-hospital mortality in emergency department

a b s t r a c t

Background: This study was designed to evaluate and compare the prognostic value of the APACHE II, APACHE IV, and SAPSII scores for predicting in-hospital mortality in the ED on a large sample of patients. Earlier studies in the ED setting have either used a small sample or focused on specific diagnoses.

Methods: A prospective study was conducted to include patients with higher risk of mortality from March 2016 to March 2017 in the ED of Emam Reza Hospital, northeast of Iran. Logistic regression was used to develop three models. Evaluation was performed in terms of the overall performance (Brier Score, BS, and Brier Skill Score, BSS), discrimination (Area Under the Curve, AUC), and calibration (calibration graph).

Results: A total of 2205 patients met the study criteria (53% male and median age of 64, IQR: 50-77). In-hospital mortality amounted to 19%. For APACHE II, APACHE IV, and SAPS II the BS was 0.132, 0.125 and 0.133 and the BSS was 0.156, 0.2, and 0.144, respectively. The AUC was 0.755 (0.74 to 0.779) for APACHE II, 0.794 (0.775 to 0.818) for APACHE IV, and 0.751 (0.727 to 0.776) for SAPS II. The APACHE IV showed significantly greater AUC in com- parison to the APACHE II and SAPS II. The graphical evaluation revealed good calibration of the APACHE IV model. Conclusion: APACHEIV outperformed APACHEII and SAPSII in terms of discrimination and calibration. More vali- dation is needed for using these models for decision-making about individual patients, although they would per- form best at a cohort level.

(C) 2020

  1. Introduction

Emergency Departments (EDs) are facing significant challenges in delivering timely, complete, and high Quality patient care due to in- creasing number of patients, overcrowding, and inability to flex limited Hospital resources to meet demands [1-3]. Identification of critically ill patients could prompt effective treatment initiation as well as appropri- ate matching of patient’s needs with available facilities in the time- sensitive condition of the ED [4,5]. Severity-of-illness scoring systems may provide useful objective information for stratifying and prioritizing patients with poor outcome. Different case mixes and admission

* Corresponding author at: Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.

E-mail address: [email protected] (S. Eslami).

1Both contributed equally to this study as the first author.

policies may affect the predictive behavior of models based on severity-of-illness scores, and it is important to understand their predic- tive ability before they can be used for e.g. triage.

Scores which are obtained based on physiological assessment of pa- tients provide a common understanding of risk for physicians and make prioritization of patients more reasonable [6]. The Acute Physiology and Chronic Health Evaluation (APACHE) is a scoring system which is used to prognosticate the in-hospital mortality using the single assessment of physiologic and laboratory parameters at admission [7]. The second version of this scoring system (APACHE II) was the most widely used prognostic index in the Intensive care units for years after its in- troduction [8]. Over the last 30 years, the scientific and technological de- velopments in intensive care medicine caused APACHE II to overestimate the mortality rate in many scenarios. In 2006, an updated version (APACHE IV) was introduced to fix this problem [9]. The Simpli- fied Acute Physiologic Score (SAPS II) is another common risk

https://doi.org/10.1016/j.ajem.2020.05.053

0735-6757/(C) 2020

stratification score which includes 17 variables (physiologic, age, under- lying disease, and type of admission) to estimate the probability of in- hospital mortality in the ICU [10].

While evaluation studies of prediction models are routinely per- formed in the critical care settings, far less attention has been paid to the application of such scores for outcome prediction on the patients ad- mitted to the ED with various diagnoses [11]. This study was designed to evaluate and compare the Predictive performance of the APACHE II, APACHE IV, and SAPS II scoring systems among patients presented to the ED of a tertiary care center in Iran.

  1. Methods
    1. Setting and study population

This was a prospective observational cohort study performed from March 2016 to March 2017 aiming to include all types of patients admit- ted to the ED in different seasons for a one-year period [30]. The study was performed in the Emam Reza Hospital in Mashhad, which was the largest university hospital in the eastern and northeastern part of Iran at the time of the study. The hospital consists of a general 60-bed ED (including observation and holding units) which admits all types of acute patients. We have 2 stages triage that implemented in our ED, primary triage and secondary triage. Primary triage is responsible to refer patients to subspecialty EDs such as pediatric EM, OB & Gyn, toxi- cology, burn, etc. Secondary triage is responsible to assign patients to general ED. Annual patient volume of our ED was about 220,000 pa- tients. About 33% (72,600) of these patients are referred to these sub- specialty EDs by primary triage. The rest of the patients undergo secondary triage and present to general ED. Approximately 55% (121,000) of patients are categorized in level 4 and 5 of triage. 8% (17,600) of patients are whom that although are categorized in level 3 or 2 of triage but they do not need either a complete work up and long stay like renal colic, anaphylaxis, … or they need to be rapidly transferred like STEMI, stroke, emergent and urgent operations, …. The rest are those who presented and undergo diagnostic and labora- tory evaluation and medical interventions for their stabilization in ED and stay long enough that the result become ready and applied. Once a patient needed laboratory evaluation and met the Inclusion/exclusion criteria, his/her clinical variables were included for further analysis.

Inclusion and exclusion criteria

All adult patients (older than 18 years of age) with high Acuity level at presentation (Emergency Severity Index of 1 to 3) were in- cluded. The following patients were excluded from analyses: patients who were returned to ED due to a previously recorded diagnosis, discharged before 4 h, or died upon arrival. Moreover, since the Emam Reza hospital included special emergency departments (i.e. burns, trauma, poisoning, obstetrics, and surgery), the patients were referred

directly as soon as the relevant diagnosis was made. Thus, the patients

also obtained to measure cell blood count parameters, electrolytes (so- dium and potassium), urea, creatinine, Blood sugar, and arterial blood gas analysis (i.e. pH, bicarbonate level, oxygen, and carbon dioxide pres- sure). Since it we additionally include albumin and bilirubin in the model, a 1 ml tube containing the respective serums was transferred to the central laboratory. Although these measurements are obtained within the first hour of presentation to the ED, in our setting the results were returned after about 6 h. Moreover, the mechanical ventilation support, mode of arrival (with or without ambulance), and in-hospital outcome (dead or alive) were recorded for each particular patient. The APACHE II, APACHE IV, and SAPS II scores were calculated using the var- iables, which were measured based on the patient’s condition within the first hour after presentation. The length of hospital stay (the time from presentation to the ED to discharge for alive cases and the time from presentation to death for dead cases) was recorded, in days.

2.4. Statistical analyses

The predictive performance of the models was assessed in terms of the overall accuracy, discrimination, and calibration. The overall accu- racy was measured by the Brier Score and the Brier Skill Score (BSS). The BS measures the mean error in predictions using the formula:

N

BS = 1 ? (predicted probabilty-actual outcome 2. The value can

N )

i=1

range from 0 to 1 (0 for a model with correct predictions for all individ- uals, 0.25 for a models with 50% chance of meeting the target, and 1 for a model with completely incorrect predictions) [12]. The BSS is defined as: BSS = 1- BS , where BSref is the BS of a reference model which sim-

BSref

ply provides the mean probability of mortality for each patient. The BSS can range from -? to 1 where negative values indicate that the predic- tions are less accurate than the reference forecast, 0 indicates that the predictions are as accurate as the reference forecast, and 1 shows a per- fect skill compared to the reference forecast [13]. BSS can be understood as the proportion of improvement of accuracy over the accuracy of the reference model.

The discrimination ability of the model between the survivors and non-survivors was assessed using the Area Under the Receiver Operat- ing Characteristic Curve (AUC). The AUC can range from 0 to 1 where larger values indicate better prognostic power. A value of 0.5 indicates no discrimination. The bootstrapping method with 1000 replicated samples was used to calculate the bias-corrected estimate of the AUC and its Confidence Interval (CI) for each model, as well as for obtaining the CI for the difference between AUCs for each pair of models. The com- parison of the AUCs was also quantified using the DeLong method [14]. The Youden Index method was used to find the best threshold and cal- culate the sensitivity, specificity, Positive Predictive Value (PPV), Nega- tive Predictive Value (NPV), and accuracy.

Logistic regression analysis was used to predict the in-hospital mor- tality (dependent variable) using the APACHE II, APACHE IV, or SAPS II scores as independent variables, separately. The Probability of death

who had no clinical evaluation in the Emam Reza emergency depart-

was calculated using the logit formula: P =

1

1+e-(?0 +?1 X)

(?0: intercept,

ment were excluded from the study. Hence our model is not meant to be applicable to these groups of patients. Furthermore, the patient’s re- cord was excluded from the data set once calculation of the prediction score was not possible due to at least one assessment missing or inac- cessible personal identification. Patients without medical indication for arterial blood gas were also excluded due to the invasive nature of the measuring procedure.

Study variables

Patient’s basic and clinical characteristics were extracted from the patient’s medical record. The Glasgow Coma Scale (GCS) and vital signs (temperature, blood pressure, heart rate, respiratory rate, and ox- ygen saturation) were measured upon arrival. The blood sample was

?1: coefficient, and X: score). The level of agreement between the ob- served and predicted outcomes was evaluated using a calibration graph with the predictions on the x-axis and the proportion of observed events on the y-axis. Perfect calibration implies that the points lie on x = y line [15].

Data were reported as mean +- standard deviation (SD) for continu- ous variables and as the frequency (%) for categorical variables. For nor- mally distributed variables (graphically assessed and/or via the Shapiro- Wilk test), the t-test was used for the comparisons of the continuous variables between survivors and non-survivors, otherwise the Mann- Whitney U test was used. Categorical data were compared using Chi- square test or Fisher’s exact test. A p-value smaller than 0.05 or a CI not including zero for differences in performance were considered as statistically significant in comparisons. All statistical analyses were

performed in the R statistical environment [16] using the pROC, ROCR, rms, ResourceSelection, and Hmisc packages.

  1. Results

A total of 3064 patients met the inclusion criteria. After applying the exclusion criteria 2205 patients remained for further analyses (Fig. 1). About 53% of cases were male. The median age of the patients was 64 (50-77) in the range of 18-98 years. The median age of survivors was 63 (48-76) compared with 70 (57-80) for non-survivors (P b 0.001). The median follow-up time was 6 (2-10) days. The distribution of the ESI I, II, and III levels was different for both groups (61%, 76%, and 87% for survivors vs. 39%, 24%, and 13% for non-survivors). The patients were mostly admitted due to infectious (45%), neoplastic (32%), and re- spiratory (27%) disorders. Further clinical, hematologic, biochemistry, and gasometry parameters are listed in Table 1.

The linear predictor (LP) formulas of the logistic regression models were as follows:

  • -3.527 + 0.186 x APACHE II
  • -4.873 + 0.063 x APACHE IV; and
  • -4.411 + 0.094 x SAPS II.

As shown in Table 2, the BS for the APACHE II, APACHE IV, and SAPS II scores was 0.132, 0.125 and 0.133 and the BSS was 0.156, 0.2, and 0.144, respectively. The AUC was 0.755 (95% CI: 0.74 to 0.779) for APACHE II, 0.794 (95% CI: 0.775 to 0.818) for APACHE IV, and 0.751 (95% CI:

0.727 to 0.776) for SAPS II (Fig. 2). Significant differences were observed between the AUC of the APACHE IV and APACHE II (95% CI: 0.021 to 0.057) and APACHE IV and SAPS II (95% CI: 0.024 to 0.061). The esti-

mated AUC for the APACHE II and SAPS II was not statistically different (95% CI: -0.021 to 0.018). The graphical evaluation revealed good cali- bration of the APACHE IV model over the entire range of predicted

probabilities, which was better than the calibration of the other two models (Fig. 3). The sensitivity, specificity, PPV, and NPV values are re- ported in Table 3.

  1. Discussion

Optimal resource allocation in EDs is relevant in the ever-present background of understaffing and growing number of patients, especially in Developing countries. Employing objective and accurate prognostic scoring systems can help distinguish and manage the complicated pa- tients as well as improving benchmarking indices. This study investi- gated the predictive performance of the most popular risk stratification scoring systems in the intensive care (i.e. APACHE II, APACHE IV, and SAPS II) to predict in-hospital mortality when applied to the emergency setting.

Quantitative discriminations measures revealed that APACHE II, APACHE IV, and SAPS II had fair discrimination with the APACHE IV bor- dering on what is usually considered good discrimination between sur- vivors and non-survivors (AUC 0.755, 0.794, and 0.751, respectively). With respect to the discrimination power, the APACHE IV prediction model was significantly better than the other two models. Lowest BS value also showed that the model based on the APACHE IV score was as- sociated with the minimum prediction errors in comparison to the ac- tual outcomes, and its highest BSS showed it capitalized the most on the room of improvement when compared to the reference model. The sensitivity of the APACHE IV model was 9 and 10 points higher than the APACHE II and SAPS II models; while its specificity was only 3 and 4 points lower than these models.

We note that in terms of discrimination ability, our AUC results are equivalent to the results that would have been obtained by an external validation of the original scores, because logistic regression models in- cluding a sole score variable will have the same AUC value regardless of the coefficients.

Fig. 1. Flowchart of patient Selection process and reasons for exclusion.

Table 1 Baseline characteristics of patients admitted to the emergency departments with the ESI of 1 to 3.

Table 2 performance measures of the APACHE II, APACHE IV, and SAPS II models to predict in-hos- pital mortality in the emergency department.

Characteristics

Dead

(N = 426)

Alive

(N = 1779)

P-value

a

Model Overall accuracy Discrimination

Brier score

Brier skill score

AUC

95% CI

APACHE II

0.132

0.156

0.755

0.74 to 0.779

APACHE IV

0.125

0.2

0.794

0.775 to 0.818

SAPS II

0.133

0.144

0.751

0.727 to 0.776

Age (year) 70 (57-80) 63 (48-76) b0.001

Male gender 232 (20%) 944 (80%) 0.62b Clinical parameters

Temperature 37 (36.8-37.3) 37 (37-37.5) 0.272a

MAP 90 (74-102) 93 (83-103) b0.001a

Heart rate 100 (86-115) 90 (80-105) b0.001a

Respiratory rate 20 (18-25) 18 (17-20) b0.001a

GCS 15 (13-15) 15 (15-15) b0.001a

Urine output

1500

(1500-1500)

1500

(1500-1500)

b0.001a

Hematology

Platelet

180 (105-272)

215 (147-289)

b0.001a

Hematocrit

35 (29-41)

36 (29-40)

0.619a

White blood cell

11.8 (8-17.6)

9.2 (6.6-12.9)

b0.001a

Biochemistry Sodium

136 (131-140)

137 (134-140)

0.008a

Potassium

4.4 (3.7-5.3)

4.2 (3.7-4.7)

b0.001a

Blood sugar

126 (97-203)

118 (99-159)

0.014a

Urea

85 (52-149)

46 (29-79)

b0.001a

Serum creatinine

1.7 (1.1-2.9)

1.2 (0.9-1.8)

b0.001a

Albumin

3.5 (2.8-3.7)

3.7 (3.5-4)

b0.001a

Bilirubin

1 (0.7-2)

1 (0.6-1.3)

b0.001a

Gasometry

PH

7.4 (7.3-7.4)

7.4 (7.3-7.4)

b0.001a

PCO2

36.7

37.7

0.04a

(27.9-44.8)

(31.7-43.1)

HCO3

19.5

22.9 (19.3-26)

b0.001a

(14.4-24.4)

PO2

95 (90-95)

95 (94-96)

b0.001a

FiO2

21 (21-40)

21 (21-21)

b0.001a

Risk scores

APACHE II 13 (10-18) 8 (5-12) b0.001a

APACHE IV 63 (53-78) 44 (32-56) b0.001a

SAPS II 37 (29-44) 27 (20-33) b0.001a

Triage level (ESI)

Level 1

Level 2

77 (39%)

199 (24%)

121 (61%)

632 (76%)

c

cept for the study performed by Williams et al. which used a dataset with 8871 patients [19].

Level 3

150 (13%)

1026 (87%)

Ventilation support

93 (76%)

30 (24%)

b0.001b

ATM

244(24%)

775(64%)

b0.001b

Diagnostic group

Certain infectious and parasitic

59 (14%)

117 (7%)

b0.001c

diseases

Neoplasms, diseases of the blood

90 (21%)

193 (11%)

Diseases of the circulatory system

58 (14%)

221 (12%)

Diseases of the respiratory system

60 (14%)

236 (13%)

Diseases of the digestive system

79 (19%)

470 (26%)

Diseases of the genitourinary

41 (10%)

194 (11%)

system

Other reasons LOS (day)

39 (9%)

6 (3-13)

348 (19%)

6 (2-10)

0.002a

0.001

Abbreviations: AUC, Area Under the receiver operating characteristic Curve; CI, Confidence Interval; APACHE, Acute Physiologic and Chronic Health Evaluation (version II and IV); SAPS, Simplified Acute Physiology Score (version II).

evaluate the models’ behavior. The median sample size was only 152 (IQR: 84-361; min-max: 48-8871) with higher male gender frequency and the mean age of the included samples was 62 years. The most used model was APACHE II. The mortality rate varied between 3.7% for the largest sample size in Australia and 56.3% for the study which included highly acute patients with hepatic portal venous gas . It should be noted that, in the current study since the patients with low acuity level (triage 4 and 5) were excluded, the mortality rate was higher than the previous reports in Iran [22]. The related studies varied mark- edly with respect to the age, gender, diagnosis, and acuity level which might affect the comparability of the performance measures. The mean APACHE II score varied between 6 and 23.75 in the related studies which is comparable to the current study. Moreover, most of the related studies limited the performance appraisal to the AUC value and the Hosmer-Lemeshow test which has its limitations (large sensitivity to sample-size, and large sensitivity to the cut-off points defining risk groups) and should not be used as the only method for calibration eval- uation [15]. The AUC values in these studies were poor to excellent for APACHE II (0.62 to 0.94) and SAPS II (0.61 to 0.91). Results of our study fit in the middle of this range. However, most of the other studies had a sample-size smaller than 500 patients, resulting in large CIs, ex-

Values are presented as Median (IQR) or N (%). Abbreviations: ESI, Emergency severity index; PaO2, Partial pressure of arterial oxygen; FiO2, Fraction of inspired oxygen; PCO2, partial pressure of carbon dioxide; HCO3, Bicarbonate; MAP, Mean arterial pressure; GCS, Glasgow coma scale; ATM, Ambulance transferring mode; APACHE, Acute Physiologic and Chronic Health Evaluation (version II and IV); SAPS, Simplified Acute Physiology Score (version II); LOS, Length of Hospital Stay.

a Analysis by Mann-Whitney U test.

b Analysis by Fisher’s exact test.

c Analysis by Chi-square test.

Graphical assessment of calibration plots demonstrated reasonable agreement between predictions and observed outcomes for the APACHE II and SAPS II models and a very good and visibly better calibra- tion for the APACHE IV model.

Over the last decade different studies have been performed around the world to evaluate the predictive performance of the APACHE II, APACHE IV, and SAPS II in the ED (see Table 4). They mostly included disease-specific patients (esp. sepsis-related problems [17-21]) to

Fig. 2. Receiver operating characteristic curves for APACHE II (0.755), APACHE IV (0.794), and SAPS II (0.751) in the emergency department.

Fig. 3. Calibration plots of the APACHE II, APACHE IV, and SAPS II models in the emergency department.

Table 3

Sensitivity, specificity, PPV, NPV, and accuracy of the APACHE II, APACHE IV, and SAPS II models to predict in-hospital mortality in the emergency department.

Model

Threshold

Sensitivity

Specificity

PPV

NPV

Accuracy

APACHE II

11.5

0.65

0.72

0.36

0.9

0.7

APACHE IV

52.5

0.75

0.69

0.37

0.92

0.7

SAPS II

32.5

0.66

0.73

0.37

0.9

0.72

Abbreviations: PPV, Positive Predictive Value; NPV, Negative Predictive Value; APACHE, Acute Physiologic and Chronic Health Evaluation (version II and IV); SAPS, Simplified Acute Physiology Score (version II).

Strengths and limitations

Important strengths of this study include its prospective design with a large number of patients who were included during a whole year

period. In addition, the study comprehensively inspects the internal performance of three popular models in terms of overall accuracy, dis- crimination, and calibration; and it compares their performances and statistically tests these performances among the three models.

This study has the following limitations. First, since this is a single- center study, the generalizability of the results might be limited. How- ever, using the largest referral center in the eastern part of the country provided us with the opportunity to include a wide range of acute disor- ders comparable to most of EDs in Iran. Second, the original scoring sys- tems were based on the worst physiologic score recorded during the initial 24-hour time period, but in the present study our measurements are obtained within the first hour of admission. This means that re- peated measurements over time might have changed the scores (based on the worst value of the measurement series) but on the other hand, our models can be used once all measurements are ob- tained. In our setting we needed to wait for 6 h to obtain the albumin

Table 4

Published evaluation studies of the APACHE II, APACHE IV, and SAPS II models in the emergency department.

Study Year Country Patients

(N)

Male gender

Agea Mortality rate (%)

Diagnosis Mean +- SD or median (IQR)

score

AUC (95% CI)

(%)

APACHE II

APACHE

SAPS II

APACHE II

APACHE IV

SAPS II

IV

NA 0.94 (0.92 to

[24]

2010

Iran

389

61%

61 +- 19

12.2%

sepsis

Case-mix

15.4 +- 5

NA

[25]

2014

Brazil

163

80%

38 +- 18

10.4%

Trauma

7.3 +- 6.2

NA

[26]

2017

China

123

50%

59 +- 12

25.2%

SFTS

20.8 +- 6.4

NA

[18]

2017

Turkey

200

55%

74 +- 15

26.5%

Sepsis

NA

NA

0.96)

NA 0.78 (0.71 to

0.84)

NA 0.75 (0.63 to

0.86)

NA 0.75 (0.64 to

0.86)

[23]

2017

Iran

82

66%

53 +- 20

48%

Case-mix

19.7 +- 8.9

NA 42.9 0.72 (0.60 to

+- 19.7 0.83)

[17]

2006

Taiwan

276

45%

72 +- 16

32.6%

Severe

22.9 +- 6.8

NA NA 0.63 (NA)

NA NA

NA NA

NA NA

NA NA

[21] 2000 United States

81 50% 64 +- 18 30.9% Septic

shock

41.9

+- 12.9

21.4

NA

NA

0.66 (NA)

NA

0.61 (NA)

6 (2-11)

NA

17

0.9 (0.88 to

NA

0.90 (0.89 to

NA NA 0.71 (0.62 to

0.78)

[19] 2016 Australia 8871 51% 49

(30-69)

3.7% Severe sepsis

(10-26)

0.91)

0.92)

[20]

2014

Italy

140

49%

74 +- 13

29%

Severe 19 +- 6 NA 50 +- 11 0.72 (0.61 to NA 0.76

sepsis 0.82) (0.66-0.85)

[27]

2014

Taiwan

48

40%

69 +- 16

56.3%

HPVG 23.8 NA 54.2 0.88 (NA) NA 0.91 (NA)

+- 10.3 +- 22

Alive: 9

[28]

2003

Sweden

1143

50%

70 +- 18

11.3%

Case-mix +- 5 NA NA 0.85 NA NA

Dead: 22

+- 8

[29]

2005

USA

91

58%

56 +- 16

20.9%

Case-mix NA NA 40 +- 14 NA NA 0.72 (0.57 to

0.87)

Current study

2018

Iran

2205

53%

62 +- 18

19%

Case-mix 9.8 +- 5.5 48.5 29.1 0.76 (0.74 to 0.79 (0.78 to 0.75 (0.73 to

+- 19.3 +- 10.5 0.78) 0.82) 0.78)

Abbreviations: NA, Not Available; AUC, Area Under the receiver operating characteristic Curve; CI, Confidence Interval; SFTS, Severe Fever with Thrombocytopenia Syndrome; APACHE, Acute Physiologic and Chronic Health Evaluation (version II and IV); SAPS, Simplified Acute Physiology Score (version II); HPVG, Hepatic Portal Venous Gas.

a Age is represented as mean +- SD or median (IQR).

ethical issues“>and bilirubin lab results that were measured in the first hour of admis- sion, but in other settings the model can be operational at much shorter times when these lab results are obtained earlier. Third, excluding the patients who were referred to special emergency departments limits the scope of the applicability of the model.

In conclusion, The APACHE II, APACHE IV, and SAPS II models had fair discrimination, with the APACHE IV bordering on good discrimination. Moreover, APACHE IV had excellent calibration for predicting in- hospital mortality in our sample, which included all-cause acute disor- ders. Further external validation and impact studies are needed before such models are considered for clinical daily use at the individual pa- tient level.

Ethical issues

The permission was obtained from the Ethics Committee of the Mashhad University of Medical Sciences.

Source of funding

This study was part of the first author MSc thesis and the authors would like to acknowledge Mashhad University of Medical Sciences, Mashhad, Iran, for financial support (grant ID: 941594).

CRediT authorship contribution statement

Zahra Rahmatinejad:Conceptualization, Methodology, Investiga- tion, Writing - review & editing.Fariba Tohidinezhad:Conceptualiza- tion, Methodology, Formal analysis, Writing - original draft, Writing - review & editing.Hamidreza Reihani:Conceptualization, Methodology, Investigation, Writing - review & editing.Fatemeh Rahmatinejad:Con- ceptualization, Methodology, Investigation, Writing - review & editing. Ali Pourmand:Conceptualization, Methodology, Investigation, Writing

- review & editing.Ameen Abu-Hanna:Conceptualization, Methodol- ogy, Formal analysis, Writing - original draft, Writing - review & editing.Saeid Eslami:Conceptualization, Methodology, Formal analysis, Writing - review & editing.

Declaration of competing interest

None.

References

  1. Yarmohammadian MH, Rezaei F, Haghshenas A, Tavakoli N. Overcrowding in emer- gency departments: a review of strategies to decrease future challenges. J Res Med Sci. 2017;22:23.
  2. Chan SS, Cheung NK, Graham CA, Rainer TH. Strategies and solutions to alleviate ac- cess block and overcrowding in emergency departments. Hong Kong Med J. 2015; 21:345-52.
  3. Oredsson S, Jonsson H, Rognes J, et al. A systematic review of triage-related interven- tions to improve patient flow in emergency departments. Scand J Trauma Resusc Emerg Med. 2011;19:43.
  4. Jin B, Zhao Y, Hao S, et al. Prospective stratification of patients at risk for emergency department revisit: resource utilization and population Management strategy impli- cations. BMC Emerg Med. 2016;16:10.
  5. Berge KH, Maiers DR, Schreiner DP, et al. Resource utilization and outcome in gravely ill intensive care unit patients with predicted in-hospital mortality rates of

95% or higher by APACHE III scores: the relationship with physician and family ex- pectations. Mayo Clin Proc. 2005;80:166-73.

  1. Gao H, McDonnell A, Harrison DA, et al. Systematic review and evaluation of physi- ological track and trigger warning systems for identifying at-risk patients on the ward. Intensive Care Med. 2007;33:667-79.
  2. Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med. 1985;13:818-29.
  3. Baltussen A, Kindler CH. Citation classics in critical care medicine. Intensive Care Med. 2004;30:902-10.
  4. Zimmerman JE, Kramer AA, McNair DS, Malila FM. Acute Physiology and Chronic Health Evaluation (APACHE) IV: hospital mortality assessment for today’s critically ill patients. Crit Care Med. 2006;34:1297-310.
  5. Le Gall JR, Lemeshow S, Saulnier F. A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. JAMA. 1993;270: 2957-63.
  6. Goldstein RS. Management of the critically ill patient in the emergency department: focus on safety issues. Crit Care Clin. 2005;21:81-9 viii-ix.
  7. Brier GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev. 1950;78:1-3.
  8. Murphy AH. A new vector partition of the probability score. J Appl Meteorol Climatol. 1973;12:595-600.
  9. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Bio- metrics. 1988;44:837-45.
  10. Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21: 128-38.
  11. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2018 URL https://www.r-project. org/.
  12. Chen C, Chong C, Liu Y, Chen K, Wang T. Risk stratification of severe sepsis patients in the emergency department. Emerg Med J. 2006;23:281-5.
  13. Ozaydin MG, Guneysel O, Saridogan F, Ozaydin V. Are scoring systems sufficient for predicting mortality due to sepsis in the emergency department? Turk J Emerg Med. 2017;17:25-8.
  14. Williams JM, Greenslade JH, Chu K, Brown AF, Lipman J. Severity scores in emer- gency department patients with presumed infection: a prospective validation study. Crit Care Med. 2016;44:539-47.
  15. Innocenti F, Bianchi S, Guerrini E, et al. Prognostic scores for early stratification of septic patients admitted to an emergency department-high dependency unit. Eur J Emerg Med. 2014;21:254-9.
  16. Nguyen HB, Rivers EP, Havstad S, et al. Critical care in the emergency department: a physiologic assessment and outcome evaluation. Acad Emerg Med. 2000;7:1354-61.
  17. Saadat S, Yousefifard M, Asady H, Moghadas Jafari A, Fayaz M, Hosseini M. The most important causes of death in Iranian population; a retrospective cohort study. Emer- gency (Tehran, Iran). 2015;3:16-21.
  18. Aminiahidashti H, Bozorgi F, Montazer SH, Baboli M, Firouzian A. Comparison of APACHE II and SAPS II scoring systems in prediction of critically ill patients’ out- come. Emergency. 2017;5.
  19. Labaf A, Zarei M, Jalili M, Talebian M, Hoseyni H, Mahmodi M. Evaluation of the Mod- ified Acute Physiology and Chronic Health Evaluation scoring system for prediction of mortality in patients admitted to an emergency department. Hong Kong J Emerg Med. 2010;17:464.
  20. Polita JR, Gomez J, Friedman G, Ribeiro SP. Comparison of APACHE II and three ab- breviated APACHE II scores for predicting outcome among emergency trauma pa- tients. Rev Assoc Med Bras. 2014;60:381-6.
  21. Yang B, Wang X, Li Y, et al. A newly established severity scoring system in predicting the prognosis of patients with severe fever with thrombocytopenia syndrome. Tohoku J Exp Med. 2017;242:19-25.
  22. Seak CJ, Ng CJ, Yen DH, Wong YC, Hsu KH, Seak JC, Seak CK. Performance assessment of the Simplified Acute Physiology Score II, the Acute Physiology and Chronic Health Evaluation II score. Am J Emerg Med. 2014;32(12):1481-4.
  23. Olsson T, Lind L. Comparison of the rapid emergency medicine score and APACHE II in nonsurgical emergency department patients. Acad Emerg Med. 2003;10:1040-8.
  24. Jones AE, Fitch MT, Kline JA. Operational performance of validated physiologic scor- ing systems for predicting in-hospital mortality among critically ill emergency de- partment patients. Crit Care Med. 2005;33:974-8.
  25. Rahmatinejad Z, Reihani H, Tohidinezhad F, Rahmatinejad F, Peyravi S, Pourmand A, Abu-Hanna A, Eslami S. Predictive performance of the SOFA and mSOFA scoring sys- tems for predicting in-hospital mortality in the emergency department. Am J Emerg Med. 2019;37(7):1237-41.