ISSN: 2226-6348
Open access
Educational Data Mining (EDM) uses vast educational datasets for discovering meaningful student participation patterns and academic achievements. Developing accurate multiclass classification models remains challenging due to it difficulties caused by class imbalance issues and irrelevant as well as redundant attributes. Filter-based feature selection methods demonstrate efficiency yet prove ineffective at resolving these problems so they create biased output performance which targets majority classes specifically. This study introduces Equitable Gain Ratio Feature Selection (EquiGR) which utilizes k-nearest neighbors to weight the class density levels for better minority group representation. The uses of Spearman Correlation Coefficient to detect and remove both strongly related redundant features along with low-ranking ones. The evaluation of proposed EquiGR method relied on four machine learning algorithms: Random Forest (RF), Naïve Bayes (NB), Support Vector Machine (SVM) and Logistic Regression (LR) as different learning paradigms for assessment. The experimental analysis of the imbalanced dataset with AE:PE:NE class distribution = 3624:4264:1183 showed EquiGR delivered better outcomes than baseline feature selection techniques for accuracy measures alongside precision and recall and F1-score metrics. The combination of RF with EquiGR reached 92.23% accuracy and 92.48% value for the NE-class F1-score. The proposed method demonstrates effective enhancement of classification results while showing remarkable improvements for minority class predictions in educational predictive modeling scenarios.
Adarsh M.D, P. K. K. (2024). Addressing K-Nn Limitations Through Boosted Multi-Algorithm Nearest Neighbour Ensembles. 2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT),
Al-Ashoor, A., & Abdullah, S. (2022). Examining Techniques to Solving Imbalanced Datasets in Educational Data Mining Systems. Int. J. Comput, 21(2), 205-213. https://doi.org/10.47839/ijc.21.2.2589
Al-Shabandar, R., Hussain, A. J., Liatsis, P., & Keight, R. (2019). Detecting at-risk students with early interventions using machine learning techniques [Article]. IEEE access, 7, 149464-149478, Article 8847304. https://doi.org/10.1109/ACCESS.2019.2943351
Altaf, S., Soomro, W., & Rawi, M. I. M. (2019). Student Performance Prediction using Multi-Layers Artificial Neural Networks: A Case Study on Educational Data Mining Proceedings of the 2019 3rd International Conference on Information System and Data Mining, Houston, TX, USA. https://doi-org.ezproxy.utm.my/10.1145/3325917.3325919
Ayouni, S., Hajjej, F., Maddeh, M., & Al-Otaibi, S. (2021). A new ML-based approach to enhance student engagement in online environment. PLOS ONE, 16(11), e0258788. https://doi.org/10.1371/journal.pone.0258788
Bach, M., & Werner, A. (2018). Cost-Sensitive Feature Selection for Class Imbalance Problem. Information Systems Architecture and Technology: Proceedings of 38th International Conference on Information Systems Architecture and Technology – ISAT 2017, Cham.
Binali, T., Tsai, C.-C., & Chang, H.-Y. (2021). University students’ profiles of online learning and their relation to online metacognitive regulation and internet-specific epistemic justification. Computers & Education, 175, 104315. https://doi.org/10.1016/j.compedu.2021.104315
Bolón-Canedo, V., & Alonso-Betanzos, A. (2019). Ensembles for feature selection: A review and future trends. Information Fusion, 52, 1-12. https://doi.org/10.1016/j.inffus.2018.11.008
Chen, L.-q., Wu, M.-t., Pan, L.-f., & Zheng, R.-b. (2021). Grade prediction in blended learning using multisource data. Scientific Programming, 2021(1), 4513610. https://doi.org/10.1155/2021/4513610
Cherrington, M., Airehrour, D., Lu, J., Thabtah, F., Xu, Q., & Madanian, S. (2019). Particle swarm optimization for feature selection: A review of filter-based classification to identify challenges and opportunities. 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON),
Chong, K. T., Ibrahim, N. B., & Huspi, S. H. B. (2023). Multiclass Student Engagement Level Prediction using Belief-Rule Based Labelling. 2023 Sixth International Conference of Women in Data Science at Prince Sultan University (WiDS PSU),
Feng, F., Li, K. C., Shen, J., Zhou, Q., & Yang, X. (2020). Using Cost-Sensitive Learning and Feature Selection Algorithms to Improve the Performance of Imbalanced Classification. IEEE access, 8, 69979-69996. https://doi.org/10.1109/ACCESS.2020.2987364
Ghosh, S. K., Janan, F., & Ahmad, I. (2022). Application of the Classification Algorithms on the Prediction of Student’s Academic Performance. Trends in Sciences, 19(14), 5070-5070. https://doi.org/10.48048/tis.2022.5070
Gledson, A., Apaolaza, A., Barthold, S., Günther, F., Yu, H., & Vigo, M. (2021). Characterising Student Engagement Modes through Low-Level Activity Patterns. In Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization (pp. 88–97). Association for Computing Machinery. https://doi.org/10.1145/3450613.3456818
Hasan, R., Palaniappan, S., Mahmood, S., Abbas, A., Sarker, K. U., & Sattar, M. U. (2020). Predicting student performance in higher educational institutions using video learning analytics and data mining techniques [Article]. Applied Sciences (Switzerland), 10(11), Article 3894. https://doi.org/10.3390/app10113894
Hassan, H., Ahmad, N. B., & Sallehuddin, R. (2021). An Empirical Study to Improve Multiclass Classification Using Hybrid Ensemble Approach for Students’ Performance Prediction. In (Vol. 724, pp. 551-561): Alfred, R., Iida, H., Haviluddin, H., Anthony, P. (eds) Computational Science and Technology.
He, Y., Zhou, J., Lin, Y., & Zhu, T. (2019). A class imbalance-aware Relief algorithm for the classification of tumors using microarray gene expression data. Computational biology and chemistry, 80, 121-127. https://doi.org/10.1016/j.compbiolchem.2019.03.017
Hou, P., Zhou, L., & Yang, Y. (2024). Density clustering method based on k-nearest neighbor propagation. Journal of Physics: Conference Series, 2858(1), 012041. https://doi.org/10.1088/1742-6596/2858/1/012041
Hu, M., & Li, H. (2017). Student engagement in online learning: A review. 2017 International Symposium on Educational Technology (ISET),
Hussain, M., Zhu, W., Zhang, W., & Abidi, S. M. R. (2018). Student Engagement Predictions in an e?Learning System and Their Impact on Student Course Assessment Scores. Computational intelligence and neuroscience, 2018(1), 6347186. https://doi.org/10.1155/2018/6347186
Jamaluddin, A. H., & Mahat, N. I. (2021). Validation assessments on resampling method in imbalanced binary classification for linear discriminant analysis. Journal of Information and Communication Technology, 20(1), 83-102. https://doi.org/10.32890/jict.20.1.2021.6358
Kabathova, J., & Drlik, M. (2021). Towards predicting student’s dropout in university courses using different machine learning techniques [Article]. Applied Sciences (Switzerland), 11(7), Article 3130. https://doi.org/10.3390/app11073130
Karalar, H., Kapucu, C., & Guruler, H. (2021). Predicting students at risk of academic failure using ensemble model during pandemic in a distance learning system. International Journal of Educational Technology in Higher Education, 18(1), Article 63. https://doi.org/10.1186/s41239-021-00300-y
Lasi, M. b. A. (2021). Online Distance Learning Perception and Readiness During Covid-19 Outbreak: A Research Review. Development, 10(1), 63-73. https://doi.org/10.6007/IJARPED/v10-i1/8593
Li, K., Yu, M., Liu, L., Li, T., & Zhai, J. (2018). Feature Selection Method Based on Weighted Mutual Information for Imbalanced Data. International Journal of Software Engineering and Knowledge Engineering, 28(08), 1177-1194. https://doi.org/10.1142/s0218194018500341
Liu, H., Zhou, M., Lu, X. S., & Yao, C. (2018). Weighted Gini index feature selection method for imbalanced data. 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC),
Luo, J., & Wang, T. (2020). Analyzing Students' Behavior in Blended Learning Environment for Programming Education Proceedings of the 2020 The 2nd World Symposium on Software Engineering, Chengdu, China. https://doi-org.ezproxy.utm.my/10.1145/3425329.3425346
Motz, B., Quick, J., Schroeder, N., Zook, J., & Gunkel, M. (2019). The validity and utility of activity logs as a measure of student engagement. Proceedings of the 9th international conference on learning analytics & knowledge,
Nand, R., Chand, A., & Naseem, M. (2020). Analyzing students' online presence in undergraduate courses using Clustering. 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Gold Coast, Australia.
Orji, F., & Vassileva, J. (2020). Using Machine Learning to Explore the Relation Between Student Engagement and Student Performance. 2020 24th International Conference Information Visualisation (IV),
Orji, F. A., Fatahi, S., & Vassileva, J. (2023). Data-Driven Approach for Student Engagement Modelling Based on Learning Behaviour. International Conference on Human-Computer Interaction,
Palli, A. S., Jaafar, J., Gilal, A. R., Alsughayyir, A., Gomes, H. M., Alshanqiti, A., & Omar, M. (2024). Online Machine Learning from Non-stationary Data Streams in the Presence of Concept Drift and Class Imbalance: A Systematic Review. Journal of Information and Communication Technology, 23(1), 105-139. https://doi.org/10.32890/jict2024.23.1.5
Pooja, K. S. (2024). Machine Learning Advancements In Education: An In-Depth Analysis And Prospective Directions. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 3229–3237. https://www.ijisae.org/index.php/IJISAE/article/view/5928
Ramaswami, G. S., Susnjak, T., Mathrani, A., & Umer, R. (2020). Predicting Students Final Academic Performance using Feature Selection Approaches. 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2020,
Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley interdisciplinary reviews: Data mining and knowledge discovery, 10(3), e1355. https://doi.org/10.1002/widm.1355
Sagoolmuang, A., & Sinapiromsaran, K. (2020). Decision tree algorithm with class overlapping-balancing entropy for class imbalanced problem. International Journal of Machine Learning and Computing, 10(3), 444-451. https://doi.org/10.18178/ijmlc.2020.10.3.955
Tan, K. H., Chan, P. P., & Mohd Said, N.-E. (2021). Higher education students’ online instruction perceptions: A quality virtual learning environment. Sustainability, 13(19), 10840. https://doi.org/10.3390/su131910840
Tiwari, D. (2014). Handling class imbalance problem using feature selection. International Journal of Advanced Research in Computer Science & Technology, 2(2), 516-520. https://doi.org/10.1007/978-981-99-2602-2_30
Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction [Article]. Computers and Education, 143, Article 103676. https://doi.org/10.1016/j.compedu.2019.103676
Tsai, C.-F., Chen, K.-C., & Lin, W.-C. (2024). Feature selection and its combination with data over-sampling for multi-class imbalanced datasets. Applied Soft Computing, 153, 111267. https://doi.org/10.1016/j.asoc.2024.111267
Zainol, S. S., Hussin, S. M., Othman, M. S., & Zahari, N. H. M. (2021). Challenges of online learning faced by the B40 income parents in Malaysia. International Journal of Education and Pedagogy, 3(2), 45-52.
Zheng, Y., Gao, Z., Wang, Y., & Fu, Q. (2020). MOOC Dropout Prediction Using FWTS-CNN Model Based on Fused Feature Weighting and Time Series [Article]. IEEE access, 8, 225324-225335, Article 9296213. https://doi.org/10.1109/ACCESS.2020.3045157
Zou, Y., Hu, X., Li, P., & Li, J. (2021). Multi-label streaming feature selection via class-imbalance aware rough set. 2021 International Joint Conference on Neural Networks (IJCNN),
Ting, C. K., Ibrahim, N., Huspi, S. H., & Kadir, W. M. N. W. (2025). A Class Density-Weighted Gain Ratio Feature Selection for Multiclass Student Engagement Classification. International Journal of Academic Research in Progressive Education and Development, 14(4), 186–202.
Copyright: © 2025 The Author(s)
Published by HRMARS (www.hrmars.com)
This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at: http://creativecommons.org/licences/by/4.0/legalcode