Classifying the Cumulative Grade Point Average of the Students Based on the Perceived Illnesses they Experienced: A Categorical Regression Analysis Using Bayesian Additive Regression Trees

Authors

  • Hannan S. Ampuan Mindanao State University, Marawi City Author

DOI:

https://doi.org/10.5281/zenodo.18683439

Keywords:

Bayesian Additive Regression Trees (BART), Cumulative Grade Point Average (CGPA), Perceived Illnesses, Classification Modeling, Academic Performance, Multicollinearity, Variance Inflation Factor (VIF), Predictive Analytics, Partial Dependence Plots, Student Health

Abstract

The study introduced and utilized Byesian Additive Regression Trees (BART) to classify the Cumulative Grade Point Average (CGPA) of students based on the perceived illnesses they experienced. This addressed the lack of application of BART method in classification predictive modeling. The data was collected from 292 respondents, including 10 illness predictors and the CGPA. First BART model showed that all predictors had higher values, indicating multicollinearity, which was later confirmed through Variance Inflation Factor (VIF) assessment. After the refinement of the model, only four predictors- stress, anxiety, headache, and stomachache- were retained. Despite adjustments, the classification BART model yielded modest accuracy at approximately 66%. Among CGPA categories, CGPA-C was the most accurately predicted, while CGPA-A showed no correct classifications. Partial dependence plots and posterior predictive checks confirmed that individual perceived illnesses had minimal predictive influence. These findings suggested that although health perceptions were commonly reported among students, they were not strong predictors of academic performance as measured by CGPA. The application of BART highlighted its ability, complexity, and flexibility for statistical predictive modeling in classification setting for consistently employing higher values to all variable importance plots which determined the genuine relationship between CGPA and perceived illnesses.

Downloads

Download data is not yet available.

References

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. CRC Press.

Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–276. https://doi.org/10.1207/s15327906mbr0102_10

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298. https://doi.org/10.1214/09-AOAS285

Diebold, F. X. (2012). On the origin(s) and development of the term “big data.” SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2152421

Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). CRC Press.

Good, I. J. (1983). Good thinking: The foundations of probability and its applications. University of Minnesota Press.

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.

Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1), 389–422. https://doi.org/10.1023/A:1012487302797

Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann.

Hassan, R., & Macarambon, S. (2024). The illnesses experienced, causes, and remedies as perceived by College of Education students. [Unpublished manuscript].

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.

Hill, J. L., Linero, A. R., & Murray, J. S. (2020). Bayesian additive regression trees: A review and look forward. Annual Review of Statistics and Its Application, 7(1), 251–278. https://doi.org/10.1146/annurev-statistics-031219-041110

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R. Springer.

Jaynes, E. T. (2003). Probability theory: The logic of science. Cambridge University Press.

Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065), Article 20150202. https://doi.org/10.1098/rsta.2015.0202

Kapelner, A., & Bleich, J. (2016). bartMachine: Machine learning with Bayesian additive regression trees. Journal of Statistical Software, 70(4), 1–40. https://doi.org/10.18637/jss.v070.i04

Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.

Liu, Y., Luo, Y., & Kapelner, A. (2023). Co-data learning for Bayesian additive regression trees. In Proceedings of Machine Learning Research. https://proceedings.mlr.press/v162/luo22a.html

MacKay, D. J. C. (2003). Information theory, inference, and learning algorithms. Cambridge University Press.

MathWorks. (2020). MATLAB documentation. The MathWorks, Inc.

Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt.

Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.

Murray, J. S. (2021). Log-linear Bayesian additive regression trees for multinomial logistic and count regression models. Journal of Computational and Graphical Statistics, 30(2), 408–422. https://doi.org/10.1080/10618600.2020.1833107

Murray, J. S., et al. (2021). Bayesian regression trees in high-dimensional environmental modeling. Journal of the American Statistical Association. https://link.springer.com

Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann.

RStudio Team. (2020). RStudio: Integrated development for R. RStudio, PBC.

Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330

Van Rossum, G., & Drake, F. L. (2009). Python 3 reference manual. CreateSpace.

Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media.

Downloads

Published

2026-02-17

How to Cite

Ampuan , H. (2026). Classifying the Cumulative Grade Point Average of the Students Based on the Perceived Illnesses they Experienced: A Categorical Regression Analysis Using Bayesian Additive Regression Trees . International Journal of Education, Research, and Innovation Perspectives, 2(2), 447-481. https://doi.org/10.5281/zenodo.18683439

Similar Articles

11-20 of 90

You may also start an advanced similarity search for this article.