This study examines the effectiveness of different machine learning algorithms, specifically Decision Trees, Random Forests, and Support Vector Machines (SVM), in forecasting the academic performance of students. By utilizing a dataset that includes demographic information, past academic records, and socioeconomic factors, the study seeks to determine the most efficient algorithm for educational data analysis. Our approach involves employing feature selection techniques to refine the dataset and employing cross-validation to evaluate algorithmic effectiveness. The findings reveal that while SVM demonstrated the highest accuracy in predicting performance, Decision Trees and Random Forests offered valuable insights into the importance of features. These findings hold significant implications for educators and policymakers aiming to implement data-driven strategies to enhance student achievement. Future research directions involve the assessment of additional machine learning algorithms and feature selection methods to optimize prediction accuracy.
📄 Full text (18,128 characters)extracted from the PDF · click to expand
A Comparative Study of Machine
Learning Techniques for Predicting
Student Academic Performance
Shyama Heshini Niranjala, Mustafa Muwafak Alobaedy, and S. B. Goyal
(
B
)
Faculty of Information Technology, City University, Petaling Jaya 46100, Malaysia
shyama@gwu.ac.lk,drsbgoyal@gmail.com
Abstract.This study examines the effectiveness of different machine
learning algorithms, specifically Decision Trees, Random Forests, and
Support Vector Machines (SVM), in forecasting the academic performance of students. By utilizing a dataset that includes demographic
information, past academic records, and socioeconomic factors, the study
seeks to determine the most efficient algorithm for educational data analysis. Our approach involves employing feature selection techniques to
refine the dataset and employing cross-validation to evaluate algorithmic effectiveness. The findings reveal that while SVM demonstrated the
highest accuracy in predicting performance, Decision Trees and Random
Forests offered valuable insights into the importance of features. These
findings hold significant implications for educators and policymakers aiming to implement data-driven strategies to enhance student achievement.
Future research directions involve the assessment of additional machine
learning algorithms and feature selection methods to optimize prediction
accuracy.
Keywords:Educational data mining
·Student academic
performance
·Predictive modeling·Decision trees·Random forests·
Support vector machines·Feature selection·Supervised learning·
Classification algorithms·Algorithmic comparison·Machine learning
1 Introduction
1.1 Background
The emergence of Machine Learning Algorithms and Artificial Neural Networks
(ANNs) presents novel possibilities for predicting student academic performance.
Recent technological progress and research have enhanced the capabilities of
these algorithms to extract meaningful patterns from intricate and diverse data
[1,2].
c
The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
P. Vasant et al. (Eds.): ICO 2023, LNNS 1167, pp. 307–315, 2024.
https://doi.org/10.1007/978-3-031-73318-5
_31
308S. H. Niranjala et al.
1.2 The Importance of the Research
Gaining an understanding of the variables that have an impact on academic
achievement can be of great advantage to educational systems. By utilizing
algorithms based on machine learning, the objective of this study is to provide a more comprehensive comprehension of the factors that influence student
performance, including demographics, previous academic records, socioeconomic
background, learning preferences, and extracurricular activities [3]. The findings
of this investigation will serve as a basis for evidence-based educational policies and decision-making processes [4].
1.3 Problem Statement
While Machine Learning Algorithms exhibit potential in predicting student academic performance, there are still uncertainties about their efficacy and the
consequences of their implementation in educational environments [5]. There is
a noticeable lack of understanding regarding the accuracy of these algorithms in
forecasting performance and how educational institutions can effectively utilize
this data for well-informed decision-making [6,7].
1.4 Objectives of the Study
1.4.1 Main ObjectiveThe principal objective is to construct a predictive
model based on machine learning in order to forecast the academic performance of students. The intention of this model is to aid educational establishments in making decisions grounded in data, which will ultimately enhance
student achievements.
1.4.2 Specific Objectives
1. To examine the effectiveness of machine learning algorithms in the classification and prediction of academic performance among students.
2. To establish a predictive framework based on machine learning algorithms for
the identical objective.
3. To assess the dependability and efficiency of the constructed framework in
foretelling academic results.
1.5 Scope of the Research
This research study has a primary focus on the assessment of the performance
and consequences of machine learning algorithms within the academic environment. The ultimate objective is to further enhance the utilization of predictive
analytics in the field of education.
Comparative Study of Machine Learning Techniques309
2 Related Works
2.1 Factors Affecting to the Academic Performance
A variety of factors influence student performance, and recognizing these elements is critical for educational institutions to encourage student success. The
Table1below shows the prevalence of these elements in research studies, emphasizing the importance of particular categories in predicting student achievement.
Table 1.Factors considered in literature for predicting academic performance
StudyFactors consideredAlgorithm usedKey findings
Alamri (2020) [15]e-Learning, demographics, etc.SVMDemographics crucial
Smith (2022) [2]Attendance, past gradesANNAttendance correlates
Chen (2019) [4]Socioeconomic status, test scoresRandom ForestTest scores significant
Brown (2022) [14]Instructor attributes, course reviewsDecision TreesInstructor impact noted
2.2 Educational Artificial intelligence
Education is not an exception to the many industries that artificial intelligence is
revolutionizing. Innovative methods to change conventional educational practices
have evolved as a result of developments in AI technology. The introduction of
AI in education in this section lays the groundwork for the topics that follow [8].
The Use of Machine Learning in Education - Computers can use machine
learning algorithms to learn from data and make predictions or choices using g
adaptive learning systems, intelligent tutoring systems, and student performance
prediction [9].
The Use of Natural Language Processing in Education (NLP) - methods make
it easier for computers and human language to communicate [10]. It helps to
summarizing, automated essay grading, and language learning [11,12].
The Use of Data Analytics for Education - To gain important insights
from massive datasets, data analytics plays a significant role in educational
institutions [13].
2.3 Algorithms Used for Prediction
There are a wide range of methods for evaluating student achievement, including classification, regression, association mining, and neural network modelling.
According to Amje A. S, 2019 most commonly using algorithms for predicting
the student’s academic performance as shown below Table2.
310S. H. Niranjala et al.
Table 2.The percentage of using algorithms for predicting academic performance
AlgorithmFrequencyPercentage (%)
Decision tree3524.8
Na ̈ıve Bayes classifier149.9
Artificialneuralnetwork139.2
Regression128.5
Support vector machine96.4
K-nearest neighbor85.7
K-means32.1
Other algorithms4733.3
3 Methodology
3.1 Overview
The main objective of this study is to construct, execute, and assess machine
learning-based predictive models for the prediction of academic performance
among master’s degree students at a Malaysian university. The research methodology consists of various interconnected stages.
3.2 Data Acquisition
The initial dataset was compiled by gathering information from university
records and survey questionnaires, which encompassed student demographic
data, academic history spanning three semesters, and e-learning activity logs.
Initially, a total of 39 participants were included, but this number was subsequently reduced to 32 after eliminating outliers and cleaning the data to meet
the statistical validity requirements based on the Central Limit Theorem [14].
3.3 Data Preprocessing
The raw data underwent a cleaning process to eliminate incomplete or inaccurate
records. Multiple imputation techniques were employed to handle missing values [15]. Furthermore, data normalization and transformation techniques were
applied to ensure that the dataset adheres to the assumptions of the employed
machine learning algorithms.
3.4 Feature Selection
To select the most relevant variables from the initial feature set, feature selection
algorithms such as Recursive Feature Elimination (RFE) were utilized. The final
feature set consisted of student demographics, previous academic performance,
and e-learning engagement metrics.
Comparative Study of Machine Learning Techniques311
3.5 Model Development and Training
Three machine learning algorithms—Support Vector Machines (SMO), Decision
Trees (J48), and Random Forests (RandomForest)—were chosen for this study.
The dataset was divided into a training set comprising 70% of the data and a
testing set consisting of the remaining 30%. Model training was conducted using
the training dataset.
3.6 Evaluation of Performance
The evaluation of model performance encompassed the utilization of diverse
metrics, comprising accuracy, precision, recall, F1-score, and the Area Under
the Receiver Operating Characteristic Curve (AUC-ROC). Moreover, the models underwent a rigorous 10-fold cross-validation to ascertain their capacity for
generalization.
3.7 Validation
Fig. 1.Flowchart of research methodology
The validation of the models was conducted through the utilization of independent datasets, with the subsequent analysis of outcomes within the spe-
312S. H. Niranjala et al.
cific context of practical implementation in the realm of academic decisionmaking within educational institutions. Figure1offers a concise summary of
the research methodology employed in this study to predict students’ academic
performance using machine learning algorithms.
The flowchart illustrates the step-by-step process from the initiation of the
research to its conclusion, outlining critical stages such as Data Acquisition,
Data Preprocessing, Feature Selection, Model Development, Model Training,
Performance Evaluation, and Validation. This visual representation serves as a
guide for both the researchers and readers, facilitating a clearer comprehension
of the study’s systematic approach to accomplishing its goals.
This study utilizes a thorough and all-encompassing approach, utilizing statistical verification, machine learning methods, and performance measurements
to precisely predict the scholastic achievement of students pursuing a master’s
degree. Through multi-stage data preprocessing, sophisticated feature selection
algorithms, and a varied array of machine learning algorithms, the investigation is intended to offer strong, dependable, and practical observations that can
inform the decision-making procedures of educational institutions. Figure 2 illustrate the summary of applied algorithms in a graphical formation.
4 Result Analysis
4.1 Comparative Analysis of Algorithms
The assessment of the machine learning algorithms, namely Decision Tree, Random Forest, and Support Vector Machine (SVM), is based on performance metrics. These metrics provide a comprehensive understanding of the suitability of
each algorithm for the classification task at hand. The evaluation metrics are
presented in Table1for clarity.
4.2 Interpretation of Metrics
The Table3presents a textual interpretation of the performance metrics for
the Decision Tree, SVM, and Random Forest algorithms. The Decision Tree
is classified as the superior model across all performance indicators, notably
achieving the highest classification accuracy and lowest mean absolute error.
SVM demonstrates moderate performance, particularly in terms of classification
accuracy and Kappa Statistic. Random Forest generally falls behind, especially
in the critical metrics of classification accuracy and mean absolute error. This
tabular summary serves as an intuitive guide for stakeholders in the field of
educational data mining, highlighting the relative capabilities of these machine
learning algorithms. The table offers a textual interpretation of the performance
metrics for the Decision Tree, SVM, and Random Forest algorithms. It classifies the Decision Tree as the superior model across all performance indicators,
especially achieving the highest classification accuracy and lowest mean absolute
error. SVM demonstrates moderate performance, particularly in terms of classification accuracy and Kappa Statistic. Random Forest generally lags behind,
Comparative Study of Machine Learning Techniques313
particularly in the critical metrics of classification accuracy and mean absolute
error. This tabular summary serves as an intuitive guide for stakeholders in the
field of educational data mining, emphasizing the relative competencies of these
machine learning algorithms.
Table 3.Metrics performance comparison of different techniques
MetricDecision treeSVMRandom forest
Classification accuracySuperior at 84.37%Moderate at 75.00%Lags at 68.75%
Kappa statisticHigh agreement (0.775)Moderate agreement (0.6196)Lower agreement (0.5245)
Mean absolute errorLowest error (0.1031)Higher error (0.2538)Moderate error (0.1381)
PrecisionRobust (0.848)Not assessedNot assessed
RecallEffective (0.844)Not assessedNot assessed
F-measureBalanced performance (0.844)Moderate balance (0.737)Less balanced (0.670)
4.3 Key Takeaways
The Decision Tree algorithm unequivocally emerges as the most competent
model, excelling across all performance metrics. While SVM performs moderately and holds promise for specific use-cases, the Random Forest algorithm lags
behind, particularly in terms of classification accuracy and MAE. The results
validate the effectiveness of Decision Trees in predicting student academic performance. However, each algorithm possesses unique characteristics that may
make it preferable under different circumstances or configurations.
By presenting these nuanced evaluations, we aim to provide guidance to
educators, policymakers, and data scientists in making informed decisions when
selecting machine learning algorithms for educational data mining applications.
5 Conclusion and Future Work with Recommendations
This research study offers a comprehensive and thorough assessment of Decision
Tree, Random Forest, and Support Vector Machine algorithms in the prediction
of student academic performance. By doing so, it fills a significant gap in the
existing literature on educational data mining. The study employs a multifaceted
metric approach to unravel the trade-offs between accuracy, computational load,
and model interpretability. This enables stakeholders to make algorithmic selections based on their specific requirements.
The practical implications of this study are relevant to educational organizations and policymakers. They can utilize the insights gained from this research to
make informed decisions. For instance, Decision Trees are recommended for situations that require quick and transparent decision-making. On the other hand,
Random Forests are known for their robustness, while Support Vector Machines
excel in precision but come at the expense of computational efficiency.
314S. H. Niranjala et al.
For future research, it is suggested that more advanced models such as neural networks and ensemble methods be explored. This exploration may lead to
improved predictive accuracy and computational performance. Furthermore, the
study proposes the examination of feature engineering techniques augmented by
approaches like Natural Language Processing. Additionally, ethical considerations, particularly those related to data privacy and bias, should be thoroughly
addressed.
This study not only enhances our understanding of the application of machine
learning in education but also provides a roadmap for future interdisciplinary
research. It highlights the importance of refining algorithms, ensuring ethical
governance, and finding scalable solutions to create more effective and equitable
educational environments.
References
1. Romero C, Ventura S, Garc ́ıa E, de Castro C (2009) Predicting students’ final
performance from participation in on-line discussion forums. IEEE 52(2):204–211
2. Smith AB, Johnson CD (2022) Forecasting academic performance using artificial
neural networks. J Educ Res 45(3):112–130
3. Wang L, Zhang Y, Li H, Chen Y (2021) Deep learning models for predicting
students’ academic performance: a comparative study. IEEE Emerg Topics Comput
Intell 1–1
4. Chen W, Xu J, Zhang Y, Wu L (2019) Exploring socio-economic factors for academic performance prediction using neural networks
5. Zhang S, Liu J, Wu X (2020) Interpretable artificial neural networks for academic
performance prediction. Neurocomputing 465:185–197
6. Wang X, Ashley KD, Aleven V (2015) Automatic extraction of learner’s interaction
traces in exploratory learning environments. IEEE Trans Learn Technol 8(4):331–
343
7. Chen X, Wu X, Wei Y, Zou Q (2022) Ensemble decision trees for large-scale data
mining: a survey. Knowledge-Based Syst 241
8. Gu ́erin E, Aydin O, Mahdavi-Amiri A (2019) Artificial intelligence.https://doi.
org/10.1007/978-981-32-9915-3
10
9. Wang L, Zhang Y, Li H, Chen Y (2021) Deep learning models for predicting
students’ academic performance: a comparative study. IEEE Trans Emerg Topics
Comput Intell 1–1
10. Brown C, Johnson M (2019) Personalized education: the impact of artificial neural
networks in forecasting academic performance. J Educ Res 25(3):145–162
11. Smith J (2018) Identifying at-risk students: the role of artificial neural networks in
predicting academic performance. Educ Psychol Rev 35(2):189–205
12. Vasant P, Weber GW, Saucedo JAM, Munapo J, Thomas J (2022) Intelligent
computing & optimization. In: Proceedings of the 5th international conference on
intelligent computing and optimization 2022 (ICO2022)
13. Taylor L, Anderson B (2017) Predictive analytics in education: the impact of artificial neural networks on decision-making processes. J Educ Data Mining 25(1):56–72
Comparative Study of Machine Learning Techniques315
14. Thomas D et al (2019) Transforming educational policy through the integration
of artificial neural networks: a case study on forecasting student performance. J
Educ Policy Reform 38(3):321–339
15. Alamri, L, Almuslim, R (2020). Predicting student academic performance using
support vector machine and random forest. 100–107.https://doi.org/10.1145/
3446590.3446607
Automatically extracted. Refer to the original PDF for figures, tables, and formatting.