Abstract

Antibiotic-resistant bacteria have proliferated at an alarming rate as a result of the extensive use of antibiotics and the paucity of new medication research. The possibility that an antibiotic-resistant bacterial infection would progress to sepsis is one of the major collateral problems affecting people with this condition. 31,000 lives were lost due to sepsis in England with costs about two billion pounds annually. This research aims to develop and evaluate several classification approaches to improve predicting sepsis and reduce the tendency of underdiagnosis in computer-aided predictive tools. This research employs medical datasets for patients diagnosed with sepsis, and it analyses the efficacy of ensemble machine learning techniques compared to nonensemble machine learning techniques and the significance of data balancing and conditional tabular generative adversarial nets for data augmentation in producing reliable diagnosis. The average F Score obtained by the nonensemble models trained in this paper is 0.83 compared to the ensemble techniques average of 0.94. Nonensemble techniques, such as Decision Tree, achieved an F score of 0.90, an AUC of 0.90, and an accuracy of 90%. Histogram-basedgradient boosting classification tree achieved an F score of 0.96, an AUC of 0.96, and an accuracy of 95%, surpassing the other models tested. Additionally, when compared to the current state-of-the-art sepsis prediction models, the models developed in this study demonstrated higher average performance in all metrics, indicating reduced bias and improved robustness through data balancing and conditional tabular generative adversarial nets for data augmentation. The study revealed that data balancing and augmentation on the ensemble machine learning algorithms boost the efficacy of clinical predictive models and can help clinics decide which data types are most important when examining patients and diagnosing sepsis early through intelligent human-machine interface.

1. Introduction

Sepsis is a severe illness which is developed when the human body’s reaction to septicity leads to tissue damage and organ failure. For prompt and efficient treatment of sepsis, early detection is essential since the mortality rate rises considerably with delayed diagnosis [1]. However, sepsis may be difficult to diagnose due to its broad and often mild symptoms and comorbidities [1]. Traditionally, sepsis has been diagnosed by clinical evaluation, laboratory testing, and imaging investigations. Research has been conducted in monitoring patients with sepsis using wearable sensor monitors in low- and middle-income countries [2]. Despite the fact that these techniques may give useful information, they may not always be adequate to provide an accurate diagnosis [3]. By examining a higher number of characteristics and using the power of data-driven decision-making, machine learning techniques, such as ensemble classifiers, have the potential to increase the accuracy of sepsis diagnosis [4]. Ensemble classifiers combine the predictions of numerous separate classifiers to provide a more accurate and dependable forecast [5]. Nonetheless, an imbalance in the class distribution in the data might impair the performance of ensemble classifiers [6]. Data balancing strategies, such as oversampling and under sampling [7], modify the number of samples in each class to enhance the classifier’s capacity to learn from the data [8]. This research work will discuss the preparation of raw data, the generation of training and testing data, as well as the implementation, training, and visualization of a sepsis prediction model based on various methodologies.

This work is organized according to the following sections: Section 2 will analyze the related literature review on sepsis, its risk factors, and biomarkers. In addition, research on ensemble classifiers in the medical area will be examined. In Section 3, the utilized dataset, its modifications, and its limits and limitations will be addressed in more depth. We provide details of the employed machine learning strategies to solve the classification issue and describe the models’ architecture. In Section 4, the results and comments will be dissected and analyzed to offer a fuller picture of the findings of the research. In Section 5, based on the study’s results, a variety of conclusions and recommendations will be presented.

Several research studies have investigated the use of machine learning techniques, especially ensemble classifiers, in the diagnosis of sepsis. For instance, Fleuren et al. [9] conducted a comprehensive assessment of machine learning algorithms for sepsis detection and discovered that ensemble classifiers performed the best among the methods evaluated. Several variables may influence the efficacy of machine learning approaches for sepsis detection, including the amount of data used for training, the model’s complexity, and the presence of noise or missing values in the data. Data balancing strategies, such as oversampling and undersampling, have been suggested as a means of addressing class imbalance and enhancing the performance of machine learning systems for sepsis detection [7].

Mohan et al. [10] examined data from individuals diagnosed with sepsis who were monitored from the time they were admitted until either they passed away or were discharged from the intensive care unit over a two-year period. Their purpose was to aid in the development of improved algorithms by providing observation that resulted in mortality from septic shock. Machine learning was utilized by Mao et al. [11]. To develop a prediction model utilizing just six routinely assessed and monitored vital indicators in medical institutes.

2.1. Risk Factors of Septic Shock

Studies have not shown that demographic factors have a major role in septic shock diagnosis. Age, gender, and length of stay are the three most significant demographic variables included in the data. In the majority of instances, age may be utilized as a significant predictor of sepsis risk [12].

2.2. Biomarkers of Septic Shock

There have been several studies that have investigated the use of biomarkers for the diagnosis and prognosis of septic shock. For example, Lu et al. [13] developed a predictive model that used a combination of biomarker parameters to predict the risk of death in patients with septic shock. The scientists showed that the model had excellent discrimination and calibration and may be used to identify trauma patients at high risk for sepsis. Dellinger et al. [14] identified several biomarkers that have been proposed as indicators of septic shock, including procalcitonin, interleukin-6, and lactate. These biomarkers have been shown to be associated with the severity and prognosis of septic shock and may be useful for identifying patients at high risk of developing the condition.

Other studies have investigated the use of biomarkers in combination with clinical and laboratory parameters to improve the accuracy of septic shock diagnosis. To aid in the diagnosis of sepsis, researchers have developed a Lateral Flow Solid-Phase RPA for Sepsis-Related Pathogen Detection [15]. Quantitative identification of lactate using optical spectroscopy to help in continuous monitoring of serum lactate levels as a precondition for sepsis-prone patients requiring intensive care [16].

2.3. Ensemble Classifiers

Ensemble classifiers are classifiers which create a collection of hypotheses before combining them through weighted or unweighted voting [17]. The outcome of merging the separate selections is an improvement in overall performance and a more precise categorization [18].

There are three issues that diminish the performance of single classifiers: statistical, computational, and representational; these issues are handled by merging the findings and obtaining a better approximation [17].

The computational issue arises when the classification algorithm employs local optimization approaches that might get stalled at local minima (optima), preventing the process from discovering the optimal hypothesis [18].

2.4. Ensemble Classifiers in Medicine

Lavanya and Rani [19] created a bagging-based ensemble classifier that was constructed from a collection of decision trees to increase the prediction accuracy of breast cancer detection. For the diagnosis of cardiac autonomic neuropathy, Kelarev et al. [20] utilized ensemble classification, and notably the Random Forest (RF), to produce a model with better abilities in prediction than those built on single classifiers.

For the purpose of predicting cancer survival, Gupta et al. [21] developed three models, each consisting of 400 SVM ensembles. The research found that using ensemble classifiers might improve prediction over traditional techniques [21]. Yao et al. [22] introduced a Random Forests-based ensemble classification method for predicting protein-protein interaction (PPI) networks.

2.5. Conditional Tabular Generative Adversarial Networks

Data generation plays a crucial role in various domains, including computer vision, natural language processing, and healthcare. Traditional approaches often rely on hand-crafted rules or statistical methods, which may not capture the complex underlying patterns of the data. Conditional Generative Adversarial Networks (cGANs) offer a promising solution by utilizing deep learning techniques to generate synthetic data that possesses desired characteristics [23].

Conditional Tabular Generative Adversarial Nets (CTGAN) is a powerful technique in the field of generative adversarial networks (GANs) that specifically focuses on generating synthetic tabular data [24]. GANs have gained significant attention in recent years for their ability to generate realistic data that closely resembles the distribution of the training data. However, traditional GANs are not well-suited for tabular data generation due to the structured nature of such data. CTGAN addresses this limitation by incorporating conditional generation, allowing users to specify the desired attributes or conditions of the synthetic data [25]. This enables CTGAN to generate synthetic tabular data that not only resembles the distribution of the training data but also follows specific attributes or conditions set by the user [25]. This makes CTGAN a more suitable option for generating tabular data compared to traditional GANs. With the ability to generate realistic and customizable synthetic data, CTGAN opens up possibilities for various applications such as data augmentation, privacy preservation, and data analysis.

3. Materials and Methods

The proposed medical approach for sepsis analysis is illustrated in Figure 1. The acquired datasets go through the cleaning stage, where the missing parameters are identified, and missing data points are rectified. Following the dimensionality reduction, the data are split into training and testing datasets where several approaches will be evaluated. Different experiments have been performed to achieve the best approach structure that can generate the best performance.

3.1. Nonensemble Machine Learning Algorithms
3.1.1. Multinomial Logistic Regression

Multinomial regression is a variant of the binary regression model, in which both use logit analysis or logistic regression (LR) to get their conclusions. Logit analysis is a complement to linear regression and is especially beneficial when the response is a categorical variable.

For a binary target variable Y and an Independent Variable X, consider the following:

Let , the logit of this probability may be expressed in linear form using the logistic regression model.

The value of is determined by the gradient of the S-shaped curve of (x). The curve is rising when is positive, while the curve is descending when is negative. The gradient’s strength is inversely proportional to the strength of [26].

3.1.2. Support Vector Machine for Classification

To classify data, SVMs seek the hyperplane in a high-dimensional space that most clearly divides the classes [27]. Support vectors are the locations that are closest to the hyperplane, and the distance between the support vectors and the hyperplane is known as the margin [27].

SVMs are particularly effective in cases where the number of dimensions is greater than that of the samples [27]. With the help of the hyperplane, the data may be projected into a lower-dimensional space, where the SVM can locate a separation border that was previously inaccessible [27]. The usage of support vector machines (SVMs) has spread across several fields, from text classification to picture classification to bioinformatics [27].

Knowledge included in the collection of correctly identified points. If

3.1.3. Multilayer Perceptron

An MLP is a neural network with numerous layers of linked “neurons,” which are computational elements that take in data, analyze it, and output a result [28]. Each neuron in the MLP’s levels gets input from all the neurons in the layer below it and sends its output to all the neurons in the layer above it because the MLP’s layers are completely linked [28].

We will call

MLPs are often used for supervised learning tasks like classification and regression [29]. As part of their training, MLPs use optimization algorithms like stochastic gradient descent to fine-tune the weights of the connections between neurons in order to reduce the error between the expected and actual output [29]. Multiple-layer perceptrons, or MLPs, have been put to use in several fields, such as computer vision, NLP, and robotics [29].

3.1.4. Quadratic Discriminant Analysis (QDA)

QDA is based on another technique known as Linear Discriminant Analysis (LDA), which is based on the assumptions that the data are normally distributed and that the classes have identical covariance matrices [30]. Different class covariance matrices are acceptable in QDA, which may sometimes lead to better performance [30].

The purpose of QDA is to discover the decision boundary that optimally divides the classes based on their means and covariances [30]. The quadratic discriminant function, which is a function of the sample features and the class means and covariances, determines the quadratic decision boundary, as opposed to the linear decision boundary used in LDA [30]. QDA has been employed in a broad variety of applications, including text classification, picture classification, and predictive modelling [30].

3.1.5. Nearest Neighbor Classification

In a collection of n pairings where is predetermined, xi takes values in an X metric space where d is defined, and takes values in the set. Every is regarded as the indication of the class that the ith instance is a member of, and each xi indicates the outcome of a set of tests conducted on the individual.

Given a new pair, in which only the measurement x may be observed, and it is wanted to estimate using the nearest neighbor to x·x is determined to belong to the category of its nearest neighbor . If . An error has occurred. Only the nearest neighbors classification is used by the NN rule.

The remaining n − 1 classifications are disregarded.

3.1.6. Decision Tree

A decision tree is a tree constructed using training data, where each leaf node denotes a label of a class and each internal node denotes a feature of the data. The classification is based on the feature values and the class labels of the training data. Decision trees are a popular machine learning method due to their interpretability and the ease with which they can be implemented [31].

3.2. Nonensemble Model Parameters

Table 1 illustrates the hyperparameter information for the nonensemble models in which we can see there have been no changes from the default parameters.

3.3. Ensemble Machine Learning Algorithms
3.3.1. Random Forest

A random forest is a kind of ensemble machine-learning technique in which numerous decision trees work together to produce an outcome that is the average of the classes produced by the individual trees [32]. The individual decision trees are trained on different parts of the training set and use a random subset of the features to make predictions, resulting in a diverse set of trees that are able to capture different patterns in the data [32]. The use of multiple trees allows the random forest to make more accurate predictions than any individual tree would be able to make on its own [32]. The algorithm’s error rate is proportional to the classification strength of each tree and the correlation between any two trees. Reducing the number of randomly selected qualities affects both the strength of each tree and the connection across trees, but increasing the number of randomly selected factors has the opposite effect [32].

3.3.2. Extra Trees Classifier

Extra trees, or extremely randomized trees, are a variant of the random forest algorithm [33]. Like random forests, extra trees are an ensemble method that consists of multiple decision trees. However, the decision trees in an extra trees’ classifier are trained using random thresholds for each feature, rather than using the best split found during the training process as in a standard decision tree [33]. This results in a greater diversity of trees in the ensemble, which can lead to improved generalization performance [33].

3.3.3. AdaBoost Decision Tree

AdaBoost works by iteratively training weak classifiers and giving more weight to the instances that were misclassified in the previous iterations [34]. Weak classifiers are typically decision trees with a single split, known as decision stumps and the final strong classifier is the weighted sum of the weak classifiers, with the weight of each weak classifier being proportional to its accuracy [34]. AdaBoost has been shown to be a powerful and effective method for improving the performance of decision trees, especially when dealing with imbalanced or noisy datasets [34].

3.3.4. Bagging Classifier

According to Breiman [35], in the bagging machine learning ensemble approach, many models are trained on various randomly chosen portions of the dataset and the models are then combined to create a prediction. Bagging is intended to lower the model’s variance by training the individual models in parallel and then combining their predictions. This can lead to improved generalization performance, especially when the training data are noisy or has a high variance. Bagging can be applied to any machine learning algorithm, but it is particularly effective for decision tree-based models, which have a tendency to overfit the training data.

3.3.5. Gradient Boosting Classifier

The goal of gradient boosting is to sequentially add weak learners to the ensemble, in a way that corrects the mistakes of the previous models. This is done by fitting the new model to the residual errors of the previous model, rather than to the original response. The final model is the weighted sum of the individual trees, with the weight of each tree being determined by the loss function. Gradient boosting has been shown to be a powerful and effective method for improving decision tree-based model performance, and it has seen extensive usage [36].

3.3.6. Histogram Gradient Boosting Classifier

This classifier uses histograms to approximate the leaf values of the trees in the ensemble, rather than using exact leaf values as in traditional gradient boosting. This allows histogram gradient boosting to handle categorical features and large datasets more efficiently than traditional gradient boosting. In addition, histogram gradient boosting is more resistant to overfitting and can achieve higher predictive accuracy with fewer trees. Histogram gradient boosting has been shown to be a fast and effective method for improving the performance of decision tree-based models and has been used in a wide range of applications [37].

3.3.7. Stacked Classifier

A stacked classifier (SC) is a strategy for reducing the biases of estimators by merging them [38]. Specifically, the estimators’ outputs are stacked and fed into a single estimator to produce a final prediction. Cross-validation is used to train this final estimator [38]. The estimators used in this classifier will be composed of the ensemble classifiers used in this research with its final estimator being the logistic regressor model.

3.3.8. Voting Classifier

Using the results of many base classifiers, a voting classifier makes a combined prediction [18]. The final prediction is produced either by majority vote or by averaging the predictions of the basic classifiers, which may be trained using various algorithms and/or trained on separate subsets of the training data [18]. When the base classifiers are varied and have varying strengths, a voting classifier may be utilized to increase the performance of a single classifier in a straightforward and effective manner [18]. The estimators used in this classifier will be composed of the ensemble classifiers used in this research with its final estimator being the logistic regressor model.

3.4. Ensemble Model Parameters

Table 2 illustrates the hyperparameter information for the ensemble models.

3.5. Dataset

The MIMIC-III dataset is a large database containing detailed information on patient demographics, vital signs, medications, laboratory test results, and clinical notes, among other things [39]. The MIMIC-III dataset is widely used in research on critical care and has been used to develop machine learning models for a variety of tasks [39].

The sepsis MIMIC-III dataset is a subset of the MIMIC- III dataset that includes only patients with a diagnosis of sepsis [1]. The sepsis MIMIC-III dataset includes detailed information on the clinical course of the sepsis, including the timing and dosage of interventions, as well as the patient’s outcomes [1]. The sepsis MIMIC-III dataset is often used.

In research on sepsis, this has been used to develop machine learning models for predicting patient outcomes and identifying sepsis in real time [1].

Patients were monitored from the moment they entered the ICU, when t = 0 until they were removed from the ICU or died. The database comprised 4,683 people aged 15 and above who had sepsis or severe sepsis. These patients had 8,696 admissions, 2,585 of which were due to septic shock. The data shown in Figure 2 illustrate the duration of time the patients examined in this dataset were present, while Table 3 shows a summary describing the dataset.

3.5.1. Dataset Limitations

The dataset is imbalanced with 2932 patients with a sepsis diagnosis, whereas there are over 37000 patients without a sepsis diagnosis. A comprehensive analysis of the dataset revealed that certain attributes are totally empty, indicating that if they are not eliminated, the training set will be misled or an improperly functioning model would be generated; an example of this is shown in Figure 3.

3.5.2. Dataset Manipulation and Delimitation

This dataset contains 2932 diagnosed sepsis patients compared to 37404 patients without a diagnosis. This is resolved by augmenting the sepsis patient data by generating 2068 sepsis patients and then taking the first 5000 nondiagnosed patients and ignoring the remaining 32404 to prevent the dataset from prioritizing nondiagnosed patients during training.

Researchers often encounter the difficulty of missing data. This dataset includes components with real number values, and missing data, which will be filled in using an interpolation function that substitutes NaN values with values that have no influence on the final result but optimize the model. The sum of all attributes will be used to calculate the fraction of missing data, and this parameter will be adjusted to generate the most effective models.

The possibility of removing attributes from the training process will also be considered based on their correlation to the target variable as well as their frequency of use in current research as shown in Figure 4.

3.5.3. F Score Recall and AUC for Model Selection

The F score is a class-balanced accuracy metric since it represents the weighted harmonic mean of precision and recall. When false negatives and false positives are important in the prediction process, the F1 score is utilized. Current research shows that most sepsis prediction models for this dataset are more adept at predicting nondiagnosed patients than diagnosed patients [4]. This is due to unbalanced classes and the fact that most instances in the data are classified as nonsepsis patients leading the accuracy of nonsepsis predicted cases to dominate the overall accuracy measure.

Recall is an important metric for measuring a model’s ability to detect positive samples in which the higher the recall, the more positive samples are detected. For the purpose of machine learning in clinical settings, it can be argued that true positives are more important than true negatives as an undetected true positive can lead to a fatality, whereas an undetected true negative is not fatal.

AUC represents the area under the ROC (Receiver Operating Characteristic) curve, which plots the true positive rate against the false positive rate at different classification thresholds [40].

3.5.4. Methodology Comparison

The three works compared in this paper focus on predicting and diagnosing sepsis, but they differ in their approaches, methodologies, and evaluation metrics. While this research aims to improve sepsis prediction and reduce underdiagnosis through the use of machine learning algorithms, it evaluates ensemble and nonensemble machine learning techniques, employs data balancing and augmentation through the use of CTGAN, and reports F score, AUC, and accuracy as evaluation metrics. El-Rashidy et al. [41] proposed a multistage model for sepsis prediction that combines NSGA-II, artificial neural networks, and deep learning models. It utilizes NSGA-II and neural networks to extract the optimal feature subset from patient data. The model consists of a deep learning classification model and a multitask regression model to predict sepsis, onset time, and blood pressure. It uses the MIMIC-III real-world dataset and reports accuracy, specificity, sensitivity, AUC, and RMSE as evaluation metrics. Darwiche and Mukherjee [4] focus on developing an improved method for predicting septic shock. It trains an ensemble classifier using the MIMIC-III database and incorporates the Cox Hazard model to obtain a risk score. The Random Forest ensemble classifier is trained using this score and other features. Specific evaluation metrics are not mentioned, but the predictive accuracy of the proposed CERF method is compared to existing methods. Overall, each study presents a unique approach to sepsis prediction and diagnosis, showcasing different techniques and evaluation criteria.

4. Results

4.1. Correlation of Sepsis Factors

After quantitative analysis using the pandas Python library, we analyzed the dataset and produced Table 4 which shows us the 15 variables with the highest correlation to a sepsis diagnosis. These correlation values can give more insight into the type of data to be collected for processing in order to aid diagnosis [42]. Table 5 illustrates that the results attained by selecting the top 15 correlated attributes for training produces lower performance versus selecting for all attributes. Thus, for the training and tuning of the final selected model, we used models trained on all attributes regardless of correlation. The missing values in the data are also filled with the mean value of each attribute so as to make the data more quantitatively meaningful.

4.1.1. Machine Learning Model Evaluation and Performance Analysis

The code performs the training and testing of machine learning models to predict and evaluate sepsis. It uses a popular library called scikit-learn, which is widely used for machine learning in Python. The dataset is divided into two parts: a training set and a testing set. The training set is utilized in conjunction with 10-fold cross-validation to train the models. This approach enables a more efficient utilization of the available data, as all observations are utilized for both training and validation purposes [43]. Additionally, it is less susceptible to variations in the precise manner in which the data are partitioned, in comparison to alternative methods [44]. The testing set is used to evaluate the model’s performance.

The code follows these steps:(1)The dataset is prepared and split into input features (such as patient information) and the target variable (whether a patient has sepsis or not).(2)Using CTGAN, the data are augmented to provide more data for training and testing.(3)A portion of the dataset is set aside for testing the trained models.(4)Different machine learning models, such as logistic regression, decision trees, and ensemble models, are trained using the training data. During the training process, the models are subjected to 10-fold cross-validation in order to mitigate potential sources of unreliability and bias. This approach aims to enhance the model’s ability to discern meaningful patterns from the available data and generate dependable predictions.(5)After training the models, their performance is evaluated using various metrics, including accuracy (how often the model is correct), sensitivity (how well the model detects positive cases), specificity (how well the model detects negative cases), and F score (a combined measure of precision and recall). These metrics help assess how well the models can predict sepsis.(6)The evaluation results, such as accuracy, sensitivity, and specificity, are recorded for further analysis.

4.2. Model Performance

Table 6 displays that the Decision Tree model, with an accuracy of 90%, an AUC of 0.90, and an F score of 0.90, is the best-performing model among the nonensemble strategies.

With an F score of 0.95, an AUC of 0.95, and an accuracy of 95%, Table 7 demonstrates that the stacking classifier model is the best-performing model among the ensemble strategies.

4.3. Further Testing and Tuning

The results of further testing and tuning for the histogram-based Gradient Boosting Classification Tree model are presented in Tables 8 and 9. The tables show the performance metrics, including F score, accuracy, recall, and AUC, for different values of the L-rate and regularization (L2) parameters, respectively.

Table 9 shows the best-performing model from the ensemble techniques is the Histogram-based Gradient Boosting Classification Tree model with an F Score, accuracy, recall, and AUC of 0.96, 95, 0.96, and 0.96 respectively.

Figure 5 shows the confusion matrix for the selected model in which we can see that the model is accurate at predicting sepsis and nonsepsis patients.

The findings suggest that there is a possibility of enhancing the performance of the model by the modification of these hyperparameters. Additionally, it may be beneficial to prioritize minimizing instances of nondetection of sepsis patients, even if it leads to an increase in the diagnosis of sepsis patients, as failure to do so could have severe consequences. These findings emphasize the importance of thorough testing and tuning of model hyperparameters to optimize the performance of the histogram-based gradient; the best performance is the HGBC model with 95% in accuracy, an F score of 0.96, a Recall of 0.96, and an AUC of 0.96. Based on these results, the selected model for this paper is the HGBC model.

Boosting Classification Tree model. Further exploration and fine-tuning of these parameters can lead to improved accuracy, F score, recall, and AUC, thus enhancing the model’s predictive capabilities and overall effectiveness.

4.4. Average Performance Comparison

Figure 6 illustrates the average performance of the models created in this paper compared to the CERF models created by Darwiche and Mukherjee [4] and the ensemble DNN models by El-Rashidy [41]. The models in this paper produce a higher average F scores, AUC, and recall showing the ability of the machine learning models to produce more robust predictions with a lower risk of bias in prediction. This paper’s strengths lie in its robust performance, potential novelty in reintroducing machine learning techniques, and rigorous experimental evaluation. However, potential weaknesses include the need for further generalizability testing on diverse datasets and real-world scenarios, limited comparisons with existing state-of-the-art methods, and a potential lack of interpretability in the proposed models.

The DT and HGBC models are the only models in Table 10 comparison using CTGAN for data augmentation with.

5. Conclusion

The developed ensemble machine learning-based algorithm holds substantial importance in the clinical sector. By achieving improved efficacy in predictive models, it addresses the critical need for accurate disease diagnosis and prognosis. This algorithm can potentially revolutionize medical practices by assisting clinicians in making more informed decisions and providing better patient care.

The research study highlights the necessity of employing generative data-balancing techniques such as CTGAN in the training process. Imbalanced datasets can lead to biased models and underdiagnosis of illnesses, which can have severe consequences in certain situations. By demonstrating the effectiveness of data balancing and augmentation, the research emphasizes the need for mitigating bias and ensuring accurate predictions in healthcare applications.

The HGBC model with 95% accuracy, an F score of 0.96, a recall of 0.96, and an AUC of 0.96 had the highest performance on the sepsis data. Based on these results, the selected model for this paper is the HGBC model, which combines multiple base classifiers to improve overall prediction performance. The findings provide valuable insights for researchers and practitioners in selecting the most effective model for sepsis prediction.

We suggest that future work should focus on gathering more data on risk factors to improve disease diagnosis. Additionally, parameter tuning is identified as a crucial step to enhance the effectiveness of the models. By exploring different datasets, processing techniques, and algorithms, the research encourages further validation and fine-tuning of predictive models in order to optimize their performance.

The research holds the potential to significantly impact clinical practice by providing an effective computer-aided medical prediction approach. The developed algorithm, coupled with intelligent human-machine interfaces, can aid clinicians in early disease detection and improve patient outcomes. The research lays the foundation for further advancements in computer-aided diagnostics and personalized medicine.

Nomenclature

:Intercept of linear equation
:Gradient of linear equation
X:Independent variable
xi:Distance of the i th instance
Y:Binary target variable
:Class of the i th instance
ADA:AdaBoost decision tree
AUC:Area under curve
BC:Bagging classifier
BUN:Blood urea nitrogen
CERF:Cox enhanced random forest
DBP:Diastolic blood pressure
DT:Decision tree
ETC:Extra trees classifier
GAN:Generative adversarial network
GBC:Gradient boosting classifier
Hct:Hematocrit
Hgb:Hemoglobin
HGBC:Histogram gradient boosting classifier
HR:Heart rate
ICULOS:Intensive care unit length of stay
KNN:K nearest neighbors
LDA:Linear discriminant analysis
LR:Logistic regression
MLP:Multilayer perceptron
NSGA-II:Non-dominated sorting genetic algorithm II
PTT:Partial thromboplastin time
QDA:Quadratic discriminant analysis
Resp:Respiratory rate
RFC:Random forest classifier
RMSE:Root mean square error
SC:Stacked classifier
SVC:Support vector classifier
SVM:Support vector machine
VC:Voting classifier
WBC:White blood cell count.

Data Availability

The sepsis MIMIC-III dataset was cited.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The author gratefully acknowledges the deanship of scientific research (DSR) technical and financial support, the Ministry of Education, and King Abdulaziz University, under grant no. (IFPIP: 1018-611-1443).