Cardiovascular disease is one of the major causes of death in the world, and it is important to achieve early prediction with the help of machine learning. In this paper, four algorithms, namely, K-nearest neighbors (KNN), Support Vector Machine (SVM), decision tree and random forest, are selected to construct prediction models based on structured heart disease dataset and perform comparative performance analysis. Through standardized processing, feature screening and SMOTE oversampling, the data quality and model generalization ability are improved. Experiments show that Random Forest performs best in terms of accuracy (92.48 %) and AUC (0.9731), and KNN has the highest recall (0.95), which is suitable for high sensitivity tasks. Combined with 19 related literatures, this paper validates the suitability of each model in medical scenarios and suggests differentiated applications. This study provides theoretical basis and empirical support for the selection and deployment of intelligent prediction models for heart disease.
📄 Full text (28,763 characters)extracted from the PDF · click to expand
Performance Benchmarking of Classic
Classification Algorithms on Structured Heart
Disease Data
1
st
Mustafa Muwafak Alobaedy
Faculty of Computing & Informatics
Multimedia University
Cyberjaya, Malaysia
mustafa.alobaedy@mmu.edu.my
3
rd
M. Kazem Chamran
Faculty of Information Technology
City University Malaysia
Cyberjaya, Malaysia
ORCID: 0000 0003 3836 4443
2
nd
Xiaolin Wang
Hebi Automobile Engineering
Professional College Hebi City
China
wangxldl@foxmail.com
4
th
Mohd Nurul Hafiz Ibrahim
Faculty of Information Technology
City University Malaysia
Cyberjaya, Malaysia
hafizibrahim313@gmail.com
Abstract—cardiovascular disease is one of the major causes
of death in the world, and it is important to achieve early
prediction with the help of machine learning. In this paper, four
algorithms, namely, K-nearest neighbors (KNN), Support
Vector Machine (SVM), decision tree and random forest, are
selected to construct prediction models based on structured
heart disease dataset and perform comparative performance
analysis. Through standardized processing, feature screening
and SMOTE oversampling, the data quality and model
generalization ability are improved. Experiments show that
Random Forest performs best in terms of accuracy (92.48%)
and AUC (0.9731), and KNN has the highest recall (0.95), which
is suitable for high sensitivity tasks. Combined with 19 related
literatures, this paper validates the suitability of each model in
medical scenarios and suggests differentiated applications. This
study provides theoretical basis and empirical support for the
selection and deployment of intelligent prediction models for
heart disease.
Keywords—Heart Disease Prediction, Machine Learning,
Model Comparison, Random Forest, Medical AI
I. INTRODUCTION
Cardiovascular diseases, represented by heart disease,
have become a major challenge to the world's health system
today. According to the World Health Organization (WHO),
17.9 million people die from heart disease each year,
accounting for about 1/3 of the total deaths in the world. Heart
disease is not only the leading cause of death in developed
and developing countries but also brings a heavy economic
burden to society. Due to the combined effects of multiple
factors such as genetics, lifestyle, metabolic indicators and
environment, early screening of high-risk groups and
accurate prediction of high-risk groups are crucial to reducing
mortality and optimizing resource allocation [1].
In the traditional medical system, the diagnosis of heart
disease mostly relies on the experience of physicians and a
single number of important indicators such as blood pressure,
cholesterol and diabetes history. However, when the data scale
is large, the variables affect each other and have nonlinear
relationships. This method often encounters problems such as
low accuracy, low efficiency and strong subjectivity, and it is
difficult to adapt to the needs of modern precision medicine.
Therefore, the introduction of machine learning in medical big
data is a key technical approach to "early detection, early
diagnosis and early intervention” [2].
Nevertheless, while machine learning models can
significantly improve predictive accuracy, their 'black box'
nature often limits clinical adoption. For medical applications,
interpretability is as important as accuracy. Physicians need to
understand why a model makes a particular prediction to trust
and apply it. Therefore, methods such as feature importance
ranking in tree-based models and case-level explanations with
LIME have become essential directions in medical AI
research.
With the rapid development of artificial intelligence
technology and the widespread application of electronic
medical records, the scale and dimension of big medical data
are growing rapidly, bringing huge opportunities for auxiliary
diagnosis models based on big data. It has important
application prospects in the fields of cancer diagnosis,
diabetes diagnosis, kidney disease diagnosis and heart disease
diagnosis. Among them, K nearest neighbor (KNN), support
vector machine (SVM), decision tree (Decision Tree) and
random forest ensemble learning methods (such as gradient
boosting machine, etc.) have become the current research
hotspots due to their clear theory, flexible modelling and good
performance in real medical big data.
This study is based on the "Personal Key Indicators of
Heart Disease" dataset released by the Kaggle platform. The
dataset covers nearly 30,000 individual health records,
including key life and health variables such as smoking, body
mass index (BMI), physical activity frequency, diabetes
history, gender, age group, etc. The data structure is clear, and
the feature variables are comprehensive. It is a structured
dataset that is highly close to the real medical scene, providing
an ideal platform for comparing the actual application value
of different machine learning models.
In this context, this study takes a variety of classic
classification algorithms as the research object and
systematically compares their performance differences in
heart disease prediction, aiming to provide strong theoretical
support and empirical basis for data modelling practice in the
medical field, and promote the actual implementation and
optimization of intelligent auxiliary diagnosis.
A. Research Problem and Objectives
Research question: Under structured medical data
conditions, which classic classification algorithm is most
suitable for the binary classification prediction task of heart
disease?
a) Research objectives:
Systematically comparing the accuracy, recall, F1-score
and AUC performance of KNN, SVM, decision tree and
ensemble methods in heart disease prediction.
Evaluate the robustness and generalization ability of each
model under the same data preprocessing and verification
conditions.
Analyze the characteristic variables, algorithm
mechanisms and prediction differences behind the advantages
and disadvantages of the model.
Provide more specific interpretability analysis results,
including global feature importance ranking derived from
tree-based models and local case-level explanations, to
enhance transparency and clinical applicability.
Provide an empirical basis for the optimization and
selection of subsequent medical prediction models.
b) Scope and Structure
This study focuses on the typical binary classification task
of heart disease prediction and uses the "Personal Key
Indicators of Heart Disease" structured dataset released by the
Kaggle platform as the research basis. The dataset contains
multiple demographic characteristics and health behavior
indicators, which are suitable for machine learning
classification modelling. The research mainly revolves around
four types of classic classification algorithms: K-nearest
neighbor (KNN), support vector machine (SVM), decision
tree (Decision Tree) and ensemble methods (Ensemble
Methods, including random forest and stacking models). The
model construction adopts a unified data preprocessing
process, including missing value processing, category coding,
standardization and training and testing division to ensure that
each algorithm is compared fairly under the same conditions.
The core of the research is not to propose a new model
structure, but to clarify the performance differences and
advantages and disadvantages of different classic algorithms
in real structured health data through experimental verification
and data analysis. Cross-validation technology, multidimensional performance evaluation indicators (such as
accuracy, recall, precision, F1-score and AUC value) and
feature importance analysis methods will be used in the
research process to ensure comprehensiveness and scientific
of model evaluation.
II. LITERATURE REVIEW
A. Thematic Review
To systematically sort out the existing research, this
project is carried out in four aspects: single-model prediction,
application of integrated learning methods, deep learning and
hybrid model construction, and model interpretability and
scalable adaptive research.
1) Single-model-based prediction research
Many previous studies have focused on using a single
classification method to predict heart disease. KNN has a
simple structure, few feature dimensions, good stability, and
is suitable for small-sample medical big data. Support vector
machines have shown better results in handling data with
fuzzy boundaries due to their excellent performance in
nonlinear classification. Logistic regression has shown great
advantage in early diagnosis of diseases due to its easy
interpretation of parameter. However, it is difficult to adapt to
increasingly complex data structures and task environments
due to the method's high dependence on data distribution and
features [3], [4], [5].
2) Application of Integrated Learning Methods
With the increase of model complexity and data
dimensionality, the existing research mainly focuses on
Random Forest, Gradient Boosting, and Stacked Integration,
etc. Kumar and Thakur [4] proposed a hierarchical learning
algorithm based on meta-classifiers and multibaric classifiers,
which significantly improves the recognition accuracy and
robustness. Saad et al [7] used LightGBM as an example of its
adaptability to various types of healthcare big data, and
obtained an approximate ideal prediction accuracy (99.54%)
[6], [7].
3) Deep Learning and Hybrid Models
To further improve the performance of the model, some
scholars have proposed methods based on deep learning, or
combining them with traditional methods. Combining
traditional machine learning and convolutional neural
networks, Bharti et al. [8] proposed a new UCI algorithm
based on convolutional neural networks, which achieved a
recognition rate of 94.2%. This project intends to build an IoT
+ AI architecture based on this and develop a terminaloriented intelligent prediction system based on it. Although
these methods have achieved better results, they have
problems such as high computational complexity, poor
interpretability, and the need for large amounts of training
resources [8].
4) Exploration of Model Interpretability
In recent years, increased research has focused on model
interpretability and clinical applications. Chaudhuri et al. [10]
proposed a Random Forest-based feature importance
visualization method to help physicians better understand the
model's determination. Chang et al. [9] proposed a new
artificial intelligence algorithm and embedded into the
diagnostic process to test the effectiveness of the algorithm.
Ghaffar Nia et al [11] conducted a study on the fairness and
transparency of AI applied to the prediction of multiple
diseases. The findings of this project are an important guide to
the model's implementation, but overall, it still lacks a
systematic and systematic approach [9], [10], [11].
B. Critical Analysis
A comprehensive analysis of the existing literature reveals
the following research highlights and limitations:
a) Research Strengths:
There have been a number of research results that have
made breakthroughs in accuracy optimization, e.g.,
integration algorithms are significantly better than individual
models;
Many articles use structured health information with a real
medical context, which is conducive to transfer of results.
Some of the studies have attempted to apply the method to
clinical processes, facilitating the application of the technique.
C. Research shortcomings:
Most of the existing studies focus on “maximizing the
usefulness of the model”, neglecting the uniformity of the data
pre-processing, feature selection, and evaluation processes;
Few studies systematically compare the effects of various
classical models under the same experimental conditions;
Understandability and user acceptance are often neglected,
and there is a lack of model transparency for physicians or
patients. evaluation of model transparency for physicians or
patients; However, traditional neural network-based
approaches to fusion and deep learning are expensive and
difficult to meet the needs of small- and medium-scale
healthcare.
D. Identification of Gap
Although the existing research results have made great
progress in the field of heart disease prediction, they still face
the following problems:
Lack of standardized comparison: In terms of data preprocessing, evaluation criteria and data sets, it is difficult to
carry out systematic comparative research on multiple
integration methods such as KNN, SVM, decision tree, etc.,
which makes it difficult to evaluate the advantages and
disadvantages between different models. horizontal transfer.
Lack of measurement of actual data: Most studies use
public, simplified UCI or synthetic data, and lack largesample measurement of samples with complex structure and
high heterogeneity.
Lack of consideration of clinical applications: most
existing studies focus on engineering optimization, ignoring
the need for practical applications in terms of interpretability,
computational efficiency, and risk prediction.
To address the above issues, this research intends to make
use of structured medical big data to make up for the
deficiencies in existing research in a standardized data
processing process, and to lay the foundation for the
development and implementation of an intelligent early
warning system for clinical heart diseases.
III. RESEARCH METHODOLOGY
A. Research Design
This research experimentally compares the performance
of four models, namely KNN, SVM, decision tree and random
forest, in predicting heart disease in structured medical data,
focusing on evaluating the model's accuracy, robustness,
generalization, interpretability and clinical practicality. The
overall research process is as follows:
1) Data Acquisition and Understanding:
Using the “Personal Key Indicators of Heart Disease”
disclosed by Kaggle (Kaggle), which includes the “Personal
Key Indicators of Heart Disease” and the “Personal Key
Indicators of Heart Disease”. Disease" (Personal Key
Indicators of Heart Disease) disclosed by Kaggle (Kaggle),
which includes a number of indicators related to lifestyle
habits and health status, such as body mass index, smoking
history, diabetes history, and the number of times of
exercising, and adopts the double-labelled variable
“HeartDisease” (0 is no, 1 is yes).
2) Data preprocessing:
The consistency of the algorithm was ensured by detecting
missing values, coding binary variables, solo hot coding of
multi-class variables, and normalization of numerical features,
and eliminating the interference of external variables.
3) Modelling and Training:
Select four representative classes of classification
algorithms (KNN, SVM, Decision Tree, Random Forest);
Evaluate the stability of each algorithm using “5-fold crossvalidation”; Control the fairness of algorithm comparisons
through Grid Search or default hyperparameter testing.
4) Model Evaluation and Comparison:
The models were quantitatively compared using 5
standard performance metrics: Accuracy, Precision, Recall,
F1-score, and Area under the ROC curve (AUC).
5) Interpretability Analysis and Feature Importance
Assessment:
Use feature importance ranking for tree-based models
(e.g., Decision Trees and Random Forests); Perform
Interpretability probes for SVM models using the LIME or
SHAP methods; Analyse the differences in the treatment of
key features across models.
For interpretability analysis, both global and local
explanation methods were applied.
6) Visualization of results and analysis of conclusions:
Plotting confusion matrix, ROC curves, and AUC control
charts; Analysing the applicability of the model (including
computational complexity, feature quality dependence, cost of
misclassification, etc.); Suggesting some suggestions on the
advantages of the model and its future expansion.
The research intends to use a combination of quantitative
analysis, visual expression and theoretical interpretation to
ensure that the research results have statistical significance
and practical application value.
B. Framework
This section presents a structured methodology for
predicting heart disease using the Key Indicators of Individual
Heart Disease dataset from Kaggle. The research workflow,
illustrated in Figure 1, encompasses sequential stages of data
acquisition, preprocessing, modeling, evaluation, and
recommendations. Data preprocessing addressed missing
values, categorical encoding, and numerical normalization,
with SMOTE applied to balance class distributions. The
dataset was partitioned into training and testing subsets
(70:30) and validated using five-fold cross-validation to
ensure robustness.
Four widely used classification algorithms—K-Nearest
Neighbors (KNN), Support Vector Machine (SVM), Decision
Tree (DT), and Random Forest (RF)—were trained and
evaluated. Performance assessment employed multiple
metrics, including accuracy, recall, F1-score, and AUC,
supported by visualization techniques such as ROC curves and
confusion matrices. The findings produced model
performance rankings and identified key predictive variables,
complemented by interpretability analyses to enhance clinical
relevance.
Fig. 1. Workflow of the machine learning methodology, encompassing data
acquisition, preprocessing, modeling, evaluation, and
recommendations
Overall, the methodology in Fig 1 not only ensured
reliable and generalizable results but also offered practical
insights into model performance, interpretability, and
deployment feasibility, thereby providing a foundation for
informed clinical algorithm selection.
IV. RESULTS
This study systematically compares four algorithms,
KNN, SVM, decision tree and random forest, around the
typical medical binary classification task of heart disease
prediction. This project proposes a structured medical big data
set based on "Personal Key Indicators of Heart Disease" and
uses data preprocessing, SMOTE resampling, 5-fold cross
validation, unified evaluation indicators and graphic
visualization to ensure the fairness, scientific and
interpretability of model comparison. The data is processed by
SMOTE resampling, 5-fold cross testing, unified evaluation
indicators and graphic visualization to ensure the fairness,
scientific and interpretability of model comparison.
A. Comparison of model prediction results
Table 1 provides a summary of the performance of the four
algorithms on five measures namely: accuracy, precision,
recall, F1-score, and AUC.
TABLE I. PERFORMANCE METRICS OF CLASSIFICATION MODELS.
Model Accuracy Precision Recall F1-score AUC
KNN 0.8667 0.8143 0.9500 0.8770 0.9350
SVM
(SGD)
0.7713 0.7507 0.8128 0.7803 0.8478
Decision
Tree
0.8943 0.8921 0.8970 0.8945 0.8951
Random
Forest
0.9248 0.9296 0.9192 0.9244 0.9731
These metrics indicate the overall judgement capability,
positive class recognition capability, balance and
classification capability of the model. The experimental
evidence indicate:
The Random Forest algorithm showed the best results in
all measurements, reaching an accuracy of 92.48, F1-score of
0.9244 and an AUC of 0.9731, with a strong overall
judgement and sample discrimination potential.
Decision trees provided good predictive stability on 89.43
accuracy, 0.8945 F1-score and 0.8951 AUC with a high
interpretability and good predictive power.
KNN had the highest recall (0.95), which essentially
identified potential positive samples but was less accurate and
precise, however, it was also better suited to first-screening
tasks that were specific to false diagnosis.
SVM had lower accuracy and AUC (0.7713 and 0.8478
respectively) however it still retained a recall of 0.8128,
showing they still have some residual ability to recognize
samples locally. The scale of features, hyperparameters, and
sample distribution are important factors that impact its
performance.
Overall, it is possible to state that the best performance is
observed in Random Forest, the most sensible performance
and the ability to interpret are in Decision Tree, and highsensitivity situations are served by KNN, and local samples
are identified by SVM.
B. ROC curve and visualization analysis
The ROC curves of the four models are shown in Fig 2.
The figure shows the relationship between true positive rate
and false positive rate, and the area under the curve (AUC)
represents the discriminatory capacity of the model among the
classes. Random Forest performed the best on the test set
(AUC = 0.97), then KNN (AUC = 0.93) and Decision Tree
(AUC = 0.90), and SVM was the weakest (AUC = 0.85).
Fig. 2. ROC curve comparison of classification models
To provide a more detailed evaluation of classification
performance, confusion matrices of the four models are
presented in Fig 3. The confusion matrix allows direct
observation of true positives, false positives, true negatives,
and false negatives. Random Forest and Decision Tree
performed equally well with fewer false negatives, especially
when it comes to medical screening activities. KNN was far
more sensitive but had larger false positives, whereas SVM
falsely classified more positive cases, which means that SVM
is less stable.
Fig. 3. Confusion matrices of KNN, Decision Tree, Random Forest, and
SVM classifiers
C. Interpretability Analysis Results
Further to improve transparency and offer clinical insights,
tree-based model feature importance ranking and local
explanation generated by LIME were used as interpretability
analysis.
Fig. 4. Top 15 feature importances identified by the Random Forest model
Fig 4 shows the 15 most significant features obtained by
Random Forest. Age, BMI, history of diabetes, and smoking
were always rated among the most important predictors of
heart disease. These results can be explained by the fact that
the existing medical knowledge confirms the soundness of
decisions made by the model.
The interpretation of the LIME in Fig 5 shows that the high
BMI values and smoking history have a positive impact on the
prediction result, whereas regular physical activity has a
negative impact on the risk score in a typical high-risk patient.
Such case-level interpretability is valuable for clinicians to
understand why an individual patient is classified in a certain
way.
Fig. 5. LIME-based feature contribution explanation for a single sample
using the Random Forest model
The interpretation of the LIME in Fig 5 shows that the high
BMI values and smoking history have a positive impact on the
prediction result, whereas regular physical activity has a
negative impact on the risk score in a typical high-risk patient.
Such case-level interpretability is valuable for clinicians to
understand why an individual patient is classified in a certain
way.
The analysis shows that the models are not only
predictively accurate but also offer meaningful explanation
that can be trusted and relied on by physicians by
incorporating both global and local interpretability.
D. Summary
This section clearly demonstrates the effectiveness of four
machine learning algorithms in structured medical
information classification tasks. The results show that:
SVM is suitable for tasks with fuzzy boundaries but
moderate feature dimensions, but it needs to rely on stronger
parameter optimization strategies;
Random forest has high prediction accuracy and excellent
overall performance, and is suitable for comprehensive risk
prediction systems;
KNN is suitable for small samples and low noise, but is
not suitable for high-dimensional big data, and should be used
with caution;
Decision trees are highly interpretable and suitable for
clinicians to make judgments on the basis of decision-making,
and can also be used for pattern interpretation and feature
importance analysis.
V. DISCUSSION AND CONCLUSION
The results of this study show that each model has its own
advantages for predicting heart disease: Random forests offer
high prediction accuracy and good overall stability, but lack
interpretability; decision trees are a simple and easy-tounderstand method suitable for clinical decision-making and
feature importance analysis, but their generalizability is weak;
support vector machines are suitable for problems with fuzzy
boundaries, but require appropriate parameter adjustment; and
kNNs are suitable for processing small sample sizes and lownoise data, but caution is required when processing highdimensional data.
In practical application, many issues must be considered.
Primarily, cost and efficiency are crucial: complex models
require significant computing resources and time for training
and real-time prediction, a problem that can only be addressed
by increasing costs. Secondly, integration with healthcare
information systems requires compatibility with existing
HIS/EHR systems and integration with physicians' work
processes, which increases costs. In addition to these issues,
data privacy and compliance concerns also hinder the practical
application of this model.
Based on the results, the decision tree and random forest
algorithms offer high accuracy, ease of interpretation, and
ease of implementation, making their combined application in
clinical practice feasible. Random forests are suitable for
building integrated risk warning systems, while decision trees
can rapidly support pattern interpretation and decisionmaking. Future research will focus on multimodal data fusion,
feature enhancement, and closed-loop verification, aiming to
enhance the reliability and effectiveness of models in realworld medical scenarios.
REFERENCES
[1] M. M. Ahsan and Z. Siddique, ‘Machine learning-based heart disease
diagnosis: A systematic literature review’, Artif. Intell. Med., vol. 128,
p. 102289, Jun. 2022, doi: 10.1016/j.artmed.2022.102289.
[2] A. Sharma et al., ‘A Systematic Review on Machine Learning
Intelligent Systems for Heart Disease Diagnosis’, Arch. Comput.
Methods Eng., Mar. 2025, doi: 10.1007/s11831-025-10271-2.
[3] R. Bhuvana, S. Maheshwari, and S. Sasikala, ‘Predict the Heart Disease
Using a Logistic Regression Classifier Algorithm’, in 2023 12th
International Conference on System Modeling & Advancement in
Research Trends (SMART), Moradabad, India: IEEE, Dec. 2023, pp.
649–652. doi: 10.1109/SMART59791.2023.10428486.
[4] Y. Singh, H. K. Agrawal, and N. Kumar, ‘SVM Based Risk Estimation
in Heart Disease Prediction’, in 2024 14th International Conference on
Cloud Computing, Data Science & Engineering (Confluence),
Noida, India: IEEE, Jan. 2024, pp. 593–598. doi:
10.1109/Confluence60223.2024.10463271.
[5] X. Wenxin, ‘Heart Disease Prediction Model Based on Model
Ensemble’, in 2020 3rd International Conference on Artificial
Intelligence and Big Data (ICAIBD), Chengdu, China: IEEE, May
2020, pp. 195–199. doi: 10.1109/ICAIBD49809.2020.9137483.
[6] S. Kumar and B. Thakur, ‘Heart Disease Prediction Using a Stacked
Ensemble Learning Approach’, SN Comput. Sci., vol. 6, no. 1, p. 3,
Dec. 2024, doi: 10.1007/s42979-024-03499-5.
[7] M. N. Saad, E. Jahan, S. Ahmed, and K. M. Mohi Uddin, ‘An Ensemble
Machine Learning-Based Method to Determine the Presence of Heart
Ailment Using Pooled Dataset’, in 2024 IEEE International
Conference on Computing, Applications and Systems (COMPAS),
Cox’s Bazar, Bangladesh: IEEE, Sep. 2024, pp. 1–6. doi:
10.1109/COMPAS60761.2024.10796352.
[8] R. Bharti, A. Khamparia, M. Shabaz, G. Dhiman, S. Pande, and P.
Singh, ‘Prediction of Heart Disease Using a Combination of Machine
Learning and Deep Learning’, Comput. Intell. Neurosci., vol. 2021, no.
1, p. 8387680, Jan. 2021, doi: 10.1155/2021/8387680.
[9] V. Chang, V. R. Bhavani, A. Q. Xu, and M. Hossain, ‘An artificial
intelligence model for heart disease detection using machine learning
algorithms’, Healthc. Anal., vol. 2, p. 100016, Nov. 2022, doi:
10.1016/j.health.2022.100016.
[10] A. K. Chaudhuri, S. Das, and A. Ray, ‘An Improved Random Forest
Model for Detecting Heart Disease’, in Data-Centric AI Solutions and
Emerging Technologies in the Healthcare Ecosystem, 1st ed., Boca
Raton: CRC Press, 2023, pp. 143–164. doi: 10.1201/9781003356189-
10.
[11] N. Ghaffar Nia, E. Kaplanoglu, and A. Nasab, ‘Evaluation of artificial
intelligence techniques in disease diagnosis and prediction’, Discov.
Artif. Intell., vol. 3, no. 1, p. 5, Jan. 2023, doi: 10.1007/s44163-023-
00049-5.
Automatically extracted. Refer to the original PDF for figures, tables, and formatting.