Year : 2021 Month : November Volume : 10 Issue : 44 Page : 3736-3741,

Associated Factors with the Mortality Rate in Patients with COVID-19 - Decision Trees Vs. Logistic Regression

Soraya Siabani1, Leila Solouki2, Mehdi Moradinazar3, Farid Najafi4, Ebrahim Shakiba5

1 Department of Health Promotion and Health Education and Cardiovascular Research Centre, Kermanshah University of Medical Sciences, Kermanshah, Iran.
2 Department of Biostatistics, School of Health, Kermanshah University of Medical Sciences, Kermanshah, Iran.
3,4,5 Behavioural Disease Research Centre, Kermanshah University of Medical Sciences, Kermanshah, Iran.

CORRESPONDING AUTHOR

Leila Solouki, Department of Biostatistics, School of Health, Kermanshah University of Medical Sciences, Kermanshah, Iran.
Email : l_soloki68@yahoo.com

ABSTRACT

Background

Given the global burden of COVID-19 mortality, this study intended to determine the factors affecting mortality in patients with COVID-19 using decision tree analysis and logistic regression model in Kermanshah province, 2020.

 

Methods

This cross-sectional study was conducted on 7799 patients with COVID-19 admitted to the hospitals of Kermanshah province. Data gathered from February 18 to July 9, 2020, were obtained from the vice-chancellor for the health of Kermanshah University of Medical Sciences. The performance of the models was compared according to the sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve.

 

Results

According to the decision tree model, the most important risk factors for death due to COVID-19 were age, body temperature, admission to intensive care unit (ICU), prior hospital visit within the last 14 days, and cardiovascular disease. Also, the multivariate logistic regression model showed that the variables of age [OR = 4.47, 95 % CI: (3.16 -6.32)], shortness of breath [OR = 1.42, 95 % CI: (1.0-2.01)], ICU admission [OR = 3.75, 95 % CI: (2.47-5.68)], abnormal chest X-ray [OR = 1.93, 95 % CI: (1.06-3.41)], liver disease [OR = 5.05, 95 % CI (1.020-25.2)], body temperature [OR = 4.93, 95 % CI: (2.17-6.25)], and cardiovascular disease [OR = 2.15, 95 % CI: (1.27-3.06)] were significantly associated with the higher mortality of patients with COVID-19. The area under the ROC curve for the decision tree model and logistic regression was 0.77 and 0.75, respectively.

 

Conclusions

Identifying risk factors for mortality in patients with COVID-19 can provide more effective interventions in the early stages of treatment and improve the medical approaches provided by the medical staff.

 

Keywords

COVID-19, Decision Tree, Logistic Regression, Mortality, Risk Factor

BACKGROUND

In December 2019, the first outbreak of the novel coronavirus was reported in Wuhan (Hubei Province, China), with clinical presentations prominently similar to viral pneumonia . The patients mostly could have been working or living in or near the seafood market, where live animals were on sale. Therefore, scientists imagined that the transmission must have been  from those animals to humans. However, very soon human-to-human transmission was confirmed and the disease expanded rapidly from China to worldwide.1 In March 2020, the World Health Organization (WHO) following the rapid spread of novel coronavirus announced a new pandemic entitled COVID-19 (Coronavirus disease developed in 2019.2 Iran as the leading country in the number of cases and deaths due to COVID-19 in the Eastern Mediterranean region has been facing major challenges since January 2020.3

The worldwide statistics of COVID-19 cases and deaths has been horrendous and disquieting, specifically in Iran. The COVID global burden threatens not only the public health but also countries economy. This is worse for low to middle-income countries/nations. Covid-19 mostly affects people with underlying diseases and the elderly. We, in Iran, are facing an unprecedented epidemic that we have not seen before. Our knowledge about the outcome of COVID-19 is incomplete.3 It is also difficult to predict the COVID-19 disease trend due to its rapidly evolving nature.4,5 Therefore, exploring the epidemiology of the disease, its likely treatments as well as appropriate vaccination are still ongoing and will be for years.

The most common symptoms of COVID-19 include nausea, fatigue, body aches, fever, headache, vomiting, evidence of pneumonia, and dry cough. However, the severity of COVID-19 symptoms is not the same in all patients and varies from mild to severe, also, a limited number of patients may be asymptomatic.6 In some patients, it causes severe symptoms such as pneumonia, respiratory distress syndrome, arrhythmia, acute cardiac, kidney damages, long-term hospitalization, coma, and even death. COVID-19, in many patients, just causes mild pneumonia leading to a short-term hospitalization. Many of these patients recover after treatment.7-9 Given the fact that still, no definitive antiviral drug for the treatment of coronavirus is available, and also, inclusive information on the epidemiology and clinical features of COVID-19 disease in many countries are limited, identifying the risk factors and clinical features associated with the mortality rate in patients with COVID-19 looks vital. We need some research that can help to prevent COVID-19 and/or provide (at least helps to provide) an appropriate treatment for COVID-19.

Decision tree model and classification and regression tree algorithm (CART) as data mining methods are procedures of classifying people and identifying effective variables on an outcome.

Therefore, in the present study, the purpose was to determine a set of independent variables to classify the mortality status of COVID-19 patients admitted to the two main centres, the Farabi and the Imam-Reza hospital. These two hospitals are the places allocated to manage patients with COVID-19 in Kermanshah (the western province of Iran) with over two million population. In the current study, using the decision tree and logistic regression model, we have compared the two models of decision tree and logistic regression according to the goodness of fit indices. All analyses were performed using SPSS software version 16 with a significance level set at 0.05.

METHODS

This cross-sectional study was conducted on 7799 patients with COVID-19 admitted to all hospitals of Kermanshah province in 2020. Kermanshah province is the most populous province and the centre of western Iran. The data gathered from February 18 to July 9, 2020, were obtained from the vice-chancellor for the health of Kermanshah University of Medical Sciences.

Data were collected according to the national protocol of the Ministry of Health and Medical Education of Iran which almost is the same in all hospitals and in accordance with the latest guidelines of the WHO.

Physical and medical variables which were recorded and analysed in the present study were body temperature, coughing, shortness of breath, weakness, fever and chills, contusion pain, sore throat, runny nose, diarrhoea, nausea and vomiting, headache, abdominal pain, chest pain, joint pain, cardiovascular disease, diabetes, liver disease, kidney disease, and chronic pulmonary disease. The demographic variables included gender and age. Other related variables were admission in ICU, history of close contact with COVID-19 patients during the 14 day period, visiting a hospital or medical centres during the last 14 days, and working as laboratory or medical staff.

Age was categorized into two groups including less than 65 years, and 65 years and above. Body temperature was also categorized into two groups including less than 38 °C, and 38 °C and above. The mortality variable was considered as the response variable.

 

Declarations/Ethics Approval and Consent to Participate

The Research Ethics Committee of Kermanshah University of Medical Sciences (IR.KUMS.REC.1399.214) approved the study, which was performed in accordance with the seventh and current revision of the Declaration of Helsinki. Informed consent was obtained for the use of medical records data from patients or their companions / guardians at the time of hospitalization.

 

Statistical Analysis

Decision Trees

The decision tree is commonly used in data mining because it is a useful and powerful method for multivariate analysis, is easy to understand and interpret, and provides classification results with comprehensible graphs that create good visualization of the results.10,11 Decision tree can process both numerical and categorical data and uses a set of rules to classify samples into categories. The CART algorithm is one of the most popular and simplest decision tree algorithms. This algorithm starts from the root node so that it finds the most important independent variable and sets it as the root node. This algorithm creates two categories with maximum homogeneity within each node and maximum heterogeneity between them.12 The CART algorithm checks all the variables for the division to get the best division in the node. In this algorithm, the Gini index is used to find each variable that should be divided and also to find the best division point in each variable.13 In recent years, the decision tree has been increasingly used in medical studies.14 Therefore, in the present study we performed the CART algorithm using 26 variables to identify the factors affecting the mortality of patients with COVID-19 in Kermanshah province. The data was divided into two parts including the training group (70.0 %) and the test group (30.0 %). Firstly, using the training group, the conceptual model was created, then the final model was created using the test sample. Finally, model accuracy was calculated.

 

Logistic Regression

A logistic regression model is a form of regression in which the dependent variable is binary and independent variables can be quantitative or qualitative.15 The advantages of using logistic regression include: predicting the probability of each person belonging to any of the dependent variable categories, and the possibility of directly calculating the odds ratio using model coefficients.16 To achieve the best combination of predictor variables in the logistic regression model, we used the forward method, which is a kind of step-by-step method.17 In this study, after fitting the univariate logistic regression model, a set of significant variables (P < 0.05) entered the multivariate model. The odds ratio (OR) and 95 % CI were reported. A P-value less than 0.05 was considered statistically significant.

 

Compare the Performance of Models

The decision tree and the logistic regression models were compared concerning the area under the ROC curve (AUC) and accuracy rate. The AUC shows how much the model can distinguish between classes, and the higher the AUC, the better the pattern for predicting different classes. The accuracy criterion indicates the proportion of items that are correctly classified.

RESULTS

Out of 7799 diagnosed cases, 2558 individuals had positive polymerase chain reaction test (PCR) and 5241 cases had positive CT scans. Among these cases, 4463 (57.2 %) were men and 3336 (42.8 %) were women. Of the total patients, 874 (11.2 %) died and 6925 (88.8 %) recovered. The mean (SD) age of patients was 51.27 (21.79) years. The demographic and clinical characteristics of the patients are shown in Table 1.

 

Risk Factors of Mortality among Coronavirus Patients

After fitting the univariate logistic regression model, a set consisting of 25 variables was selected to enter the multivariate logistic regression model (Table 2) This model showed that the variables of age [OR = 4.47, 95 % CI (3.16 - 6.32)], shortness of breath [OR = 1.42, 95 % CI (1.0 - 2.01)], admission in ICU [OR = 3.75, 95 % CI (2.47 - 5.68)], abnormal chest X-ray [OR = 1.93, 95 % CI (1.06 - 3.41)], liver disease [OR = 5.05, 95 % CI (1.020 - 25.2)], body temperature [OR = 4.93, 95 % CI (2.17 - 6.25)], and cardiovascular disease [OR= 2.15, 95 % CI (1.27-3.06)] were significantly associated with the higher mortality of patients with COVID-19 (P < 0.05).

In fitting the decision tree model, all explanatory variables were used as quantitative variables and according to the Gini index, a tree with 4 levels was constructed and the age variable was placed as the most important variable in the (root node) the first level. According to the decision tree model, the most important risk factors for death due to COVID-19 were age, body temperature, ICU admission, visiting the hospital or medical centres during the last 14 days, and cardiovascular disease. The age variable was one of the risk factors associated with the death of patients with COVID-19 so that patients over 54.5 years of age who were admitted to the ICU and had a body temperature higher than 38.250C had a mortality of approximately 2.5 times more than the patients younger than 54.5 years who were admitted in the ICU and had a body temperature higher than 38.250 C. Also, due to the importance of the age variable, this variable was classified twice in the decision tree model, once in the first level and another in the third level (Figure 2). Generally, age, ICU admission, body temperature, and cardiovascular disease in both decision tree and logistics models were significant risk factors for mortality.

 

Performance of the Methods

In the decision tree model, the sensitivity and specificity of the testing group were 66.0 % and 65.0 %, respectively. Also, the sensitivity and specificity of the training group were 74.0 % and 64.0 %, respectively. To compare the two models, the AUC was calculated as 77 % for the decision tree model and 75 % for the regression model (Table 2). Also, the accuracy was calculated as 83 % for the decision tree model and 78 % for the regression model. The ROC curve for both models is shown in Figure 1.

 

 

 

 

DISCUSSION

Given the high rate of COVID-19 mortality worldwide, it is important to identify methods that can accurately find the causes of mortality. In this study, we compared the performance of the decision tree model using the CART algorithm and the logistic regression model. Generally, the performance evaluation indicators and the AUC in the decision tree model were better than the logistic regression model. In addition, the decision tree model was more accurate than the logistic regression model, i.e., it had more correct classifications.

Feng et al. conducted a study in China to identify suspected cases of COVID-19 using LASSO logistic regression, decision tree, logistic regression with ridge regularization, and Adaboost according to which, LASSO logistic regression was the best performer with an AUC of 0.84.18 Das et al. in a study aimed to predict the mortality risk of patients with COVID-19 using 5 algorithms: logistic regression, support vector machine, K nearest neighbour, random forest, and gradient boosting and among them logistic regression was the best performer with an AUC of 0.83.19 Another study aimed to identify risk factors for mortality in patients with COVID-19 using machine-learning methods (Random Forest and XGBoost) and the logistic regression model, it showed that logistic regression with AUC 0.95 had the best performance.20

Toraih et al. who used decision tree analysis to predict mortality factors in patients with COVID-19, concluded that age and cardiovascular disease were important mortality factors, accordingly individuals older than 60 years and with cardiovascular diseases were at higher risk of mortality.21 Ni et al. using multivariate logistic regression showed that patients with COVID-19 who had cardiovascular disease at the time of admission to the hospital were associated with a higher risk of mortality.22 Albitar et al. using multivariate logistic regression also showed that patients older than 65 years with COVID-19 having a history of cardiovascular disease were more likely to die.23 CDC COVID-19 Response Team also reported that 80 % of deaths associated with COVID-19 were seen in people over 65 years of age.24 Similarly, the results of our study showed that the variables of age and cardiovascular disease in both models were factors affecting mortality in patients with COVID-19. As shown in the logistic regression model, patients older than 60 years had a higher mortality rate and in the decision tree model, patients older than 54.5 years had a higher mortality rate. The results of our study showed that abnormal chest X-rays and body temperature greater than 380 C in the multivariate logistic regression model, and body temperature variable greater than 38.50 C in the decision tree model were factors affecting mortality in patients with COVID-19. Chang et al. using multivariate logistic regression also showed that patients with a body temperature above 37.50 C and abnormal chest X-ray had a higher risk of developing severe COVID-19.25

A meta-analysis study showed that the mortality rate of patients admitted to the ICU, was 3.7 times higher than those admitted to the other wards.26 Also, in our study, the mortality of patients admitted to the ICU was 3.75 times higher than patients not admitted to the ICU.

Several studies have shown that mortality in patients with COVID-19 with a history of chronic liver disease was higher than in others.26-28 In the present study, multivariate logistic regression showed that patients with a history of chronic liver disease had higher mortality rate than other patients.

The present study using multivariate regression showed that the risk of death in patients with shortness of breath was approximately 1.5 times higher than other patients and the probability of death in people over 60 years of age was approximately 4.5 times higher than the other patients. A study by Soares et al. using multivariate logistic regression also showed that patients with COVID-19 who were over 60 years of age were approximately 4 times more likely to die, and patients with shortness of breath were 3.5 times more likely to die.29

CONCLUSIONS

In this study, the accuracy and AUC in the decision tree model were greater than the logistic regression, so it can be concluded that the decision tree model had a better performance than the logistic regression model to identify the factors affecting mortality in patients with COVID-19. The most important risk factors for death due to COVID-19 in this study were age older than 54.5 years, body temperature higher than 38.50 C, ICU admission, cardiovascular disease, and prior hospital visit within the last 14 days. Identifying risk factors for mortality in patients with COVID-19 can provide more effective interventions in the early stages of treatment and improve the medical approaches provided by the medical staff.

 

Abbreviations

AUC: Area under a ROC Curve, CART: Classification and Regression Tree, ICU: Intensive Care Unit, OR: Odds Ratio, ROC: Receiver Operating Characteristic, WHO: World Health Organization.

Data sharing statement provided by the authors is available with the full text of this article at jemds.com.

REFERENCES

1
  1. Jiang S, Xia S, Ying T, et al. A novel coronavirus (2019-nCoV) causing pneumonia-associated respiratory syndrome. Cell Mol Immunol 2020;17(5):554.
  2. WHO. WHO MERS global summary and assessment of risk. [published online ahead of print January 21, 2020].
  3. Vannabouathong C, Devji T, Ekhtiari S, et al. Novel coronavirus COVID-19: current evidence and evolving strategies. J Bone Joint Surg Am 2020;102(9):734-44.
  4. Control CfD, Prevention. Coronavirus disease 2019 (COVID-19) situation summary. 2020.
  5. Paules CI, Marston HD, Fauci AS. Coronavirus infections-more than just the common cold. JAMA 2020;323(8):707-8.
  6. Chan JFW, Yuan S, Kok KH, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. The Lancet 2020;395(10223):514-23.
  7. Chen N, Zhou M, Dong X, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. The Lancet. 2020;395(10223):507-13.
  8. Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 2020;395(10223):497-506.
  9. Wang D, Hu B, Hu C, et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus–infected pneumonia in Wuhan, China. JAMA 2020;323(11):1061-9.
  10. Drazin S, Montag M. Decision tree analysis using weka. Machine Learning-Project II, University of Miami 2012:1-3.
  11. Brims FJH, Meniawy TM, Duffus I, et al. A novel clinical prediction model for prognosis in malignant pleural mesothelioma using decision tree analysis. J Thorac Oncol 2016;11(4):573-82.
  12. Sharma H, Kumar S. A survey on decision tree algorithms of classification in data mining. International Journal of Science and Research (IJSR) 2016;5(4):2094-7.
  13. Timofeev R. Classification and regression trees(CART) theory and applications. Berlin: Humboldt University 2004:1-40.
  14. Mistikoglu G, Gerek IH, Erdis E, et al. Decision tree analysis of construction fall accidents involving roofers. Expert Systems with Applications 2015;42(4):2256-63.
  15. Wijekoon N, Azeez AA. An integrated model to predict corporate failure of listed companies in Sri Lanka. International Journal of Business and Social Research 2015;5(7):1-14.
  16. Sperandei S. Understanding logistic regression analysis. Biochemia Medica 2014;24(1):12-8.
  17. Sumarga E. A Comparison of logistic regression, geostatistics and maxent for distribution modeling of a forest endemic. 2011.
  18. Feng C, Huang Z, Wang L, et al. A novel triage tool of artificial intelligence assisted diagnosis aid system for suspected COVID-19 pneumonia in fever clinics. Annals of Translational Medicine 2020.
  19. Das AK, Mishra S, Gopalan SS. Predicting CoVID-19 community mortality risk using machine learning and development of an online prognostic tool. PeerJ 2020;8:e10083.
  20. Ma X, Ng M, Xu S, et al. Development and validation of prognosis model of mortality risk in patients with COVID-19. Epidemiol Infect 2020;148:e168.
  21. Toraih EA, Elshazli RM, Hussein MH, et al. Association of cardiac biomarkers and comorbidities with increased mortality, severity, and cardiac injury in COVID‐19 patients: a meta‐regression and decision tree analysis. J Med Virol 2020;92(11):2473-88.
  22. Ni W, Yang X, Liu J, et al. Acute myocardial injury at hospital admission is associated with all-cause mortality in COVID-19. J Am Coll Cardiol 2020;76(1):124-5.
  23. Albitar O, Ballouze R, Ooi JP, et al. Risk factors for mortality among COVID-19 patients. Diabetes Res Clin Pract 2020;166:108293.
  24. CDC COVID Response Team. Severe outcomes among patients with coronavirus disease 2019 (COVID-19)-United States, February 12-March 16, 2020. MMWR Morb Mortal Wkly Rep 2020;69(12):343-6.
  25. Chang MC, Park YK, Kim BO, et al. Risk factors for disease progression in COVID-19 patients. BMC Infect Dis 2020;20(1):445.
  26. Noor FM, Islam MM. Prevalence and associated risk factors of mortality among COVID-19 patients: a meta-analysis. J Community Health 2020;45(6):1270-82.
  27. Lee JY, Kim HA, Huh K, et al. Risk factors for mortality and respiratory support in elderly patients hospitalized with COVID-19 in Korea. J Korean Med Sci 2020;35(23):e223.
  28. Zhao Y, Nie HX, Hu K, et al. Abnormal immunity of non-survivors with COVID-19: predictors for mortality. Infect Dis Poverty 2020;9(1):108.
  29. Soares RDCM, Mattos LR, Raposo LM. Risk factors for hospitalization and mortality due to COVID-19 in Espírito Santo State, Brazil. Am J Trop Med Hyg 2020;103(3):1184-90.

DISCLOSURE AND FUNDING

Disclosure forms provided by the authors are available with the full text of this article at jemds.com

ICMJE Forms

Financial or other competing interests: This work is financially supported by Kermanshah University of Medical Sciences (Grant No. 1399.214). The funders played no role in the study design, data collection, data analysis, interpretation or writing of the report.

Disclosure forms provided by the authors are available with the full text of this article at jemds.com.

Thanks to all the medical staff who take care of the patients by risking their lives.

DATA SHARING STATEMENT

A data sharing statement provided by the authors is available with the full text of this article at jemds.com

How to cite this article

Siabani S, Solouki L, Moradinazar M, et al. Associated factors with the mortality rate in patients with COVID-19 - decision trees vs. logistic regression. J Evolution Med Dent Sci 2021;10(44):3736-3741, DOI: 10.14260/jemds/2021/756

Videos :

watch?v