Multiple Regression in Determining Affecting Factors Student Success in a Statistics Subject

This study aims to find the factor that affects students ’ success in Introductory Statistics Subjects based on a Multiple Linear Regression (MLR). Gender and assessment achievements such as test 1, test 2, quiz, assignment, group project, and final test marks were investigated as predictors. The dependent variable is the overall marks of the subject. The results shows that test 1, test 2, assignment, project, and final test have a significant difference to the overall marks of the statistics subject. This study was carried out using SPSS software. In order to determine the significant variables, further research can be done using more sample size and more variables.


Introduction
Increasing student behaviour in terms of attendance, assignment completion, performance in assessments, and class participation are the factors that contributes to university excellence.The quality of lecturers influences the quality of a university in ways such as strengthening teaching discipline, conducting continuous research, and providing high-quality educational services (Shedriko, 2021).The multiple linear regression model (MLR) consists of one criterion, Y, also known as the response, predicted, outcome, or dependent variable, and predictors, p, also known as independent variables.There are numerous advantages to using a multiple regression model to analyse data.One of them is determining the relative influence of one or more predictor variables on the criterion value.Besides, it can identify outliers or anomalies.On the other hand, the relapse examination hypothesis can be exceptionally unappeasable since one excluded variable can make all relapse coefficients one-sided to an obscure degree and course (Klees, 2016).This demonstrates that any disadvantage of using a multiple regression model is usually due to the data used.MLR has numerous applications in nearly every field, including engineering, physical and chemical sciences, finance, administration, life, biological sciences, social sciences, and academics.The following are examples of MLR applications in academia.Previous study by Yang et al (2018) used Multiple Linear Regression (MLR) a popular method in predicting student's academic performance.Mahmud et al (2022) employed Multiple Linear Regression (MLR) to investigate the factors that influence students' academic performance during the COVID-19 pandemic.It has been discovered that students' hometowns and the hours they spent preparing before class contributed significantly to the model.Tinuke Omolewa et al (2019) examine the student's performance using k-mean clustering and Multiple Linear Regression.It was found that, test score, quiz and assignment were the major factors in academic performance.Furthermore, Weidlich and Bastiaens (2018) investigated a study on the impact of transactional distance on satisfaction in online distance learning.In this study, the dependent variable is satisfaction in online education.The independent variables are TD Student-Teacher (TDST), TD Student-Student (TDSS), TD Student-Content (TDSC), and TD Student-Technology (TDSTECH) as TD stands for transactional distance.This study revealed TDSTECH is the single most important independent variable or predictor of satisfaction in online distance learning for the chosen population.In addition, mediator analysis revealed that TDSTECH mediates the relationship between TD student-teacher and satisfaction, but not for TD student-content.However, there is no significant relationship between TD student-student and satisfaction.Meanwhile, Rachmawati et al (2021) investigated the effect of online learning and parental guidance towards the result of XI social students' learning on Geography courses at SMAN 5 Jember.Independent variables in this study are online learning and parental guidance, and the dependent variable is the study result.This study concluded that online education and parental guidance could affect students' learning outcomes in Geography subject.Hsu Wang (2019) studied the prediction of online behaviour and achievement by using selfdirected learning awareness in flipped classrooms.The dependent variables in this study are the prediction of online behaviour and achievement, whereas the independent variable is self-directed learning factors.The results indicated three things: task value, intrinsic motivation, control of learning beliefs, and metacognition predict achievement.SRL awareness predicts online behaviours to a limited extent, and a combination of SRL awareness and online behaviours indicates that achievement is better than either one of the single-domain models.Forson and Vuopala (2019) conducted online learning readiness of students enrolled in distance education in Ghana.The dependent variable is online learning readiness, whereas the independent variables are students' attitude, self-regulated learning, ICT skills, and collaborative skills.The study discovered that distance education students have positive attitude towards online learning.They also possess good self-regulated learning, cooperative and information communication and technology skills relevant for online learning.From the previous studies, it is clear that linear regression can determine the relationship between two variables in education.Hence, this study aims to determine the factors that affect student success in statistics subject using Multiple Regression analysis.

Methodology
Secondary data was obtained from all students who took the subject introduction to statistics at UiTM Cawangan Melaka Kampus Jasin in the two semesters of 2021.There were 236 undergraduate students in the Faculty of Plantation and Agrotechnology who took that subject.Test 1, test 2, quiz, assignment, group project, final test, gender and overall marks are the input parameters used.
The data analysis method employs statistical and logical techniques to describe, illustrate, and evaluate data.It began with the validation of assumptions and the evaluation of the fitted model.Assumption must be met during the validating assumptions stage, or the process must be restarted from the beginning.Throughout the evaluation of the fitted model, the estimated model was tested three times before it could be claimed as the best model and used to forecast values.Data was analysed using SPSS to represent the findings.

Findings and Discussions a) Set of Variables
The study analyzed the factor on students' performance in subject by measuring students' marks on seven factors which are test1, test 2, quiz, assignment, group project, final test, gender and overall mark in Table 1. Where

Validating Assumption
In regression analysis, many assumptions about the model and the Multiple Linear Regression (MLR) model are one of the fussier of the statistical techniques as it makes several assumptions about the data.If one or more assumptions are violated, then the model in hand is no longer reliable and not acceptable in estimating the population parameters (Daoud, 2018).In this study, four assumptions were discussed.

a)
The relationship between independent variables and dependent variables is linear Based on Table 4, it shows that gender has a weak negative correlation with overall marks (r -0.109).Test 1, test 2, quiz assignment and project have weak positive correlations with overall marks.The correlation between the two variables above shows that no factor is controlled or held constant.Final test has a strong positive correlation with overall marks (r = 0.903).The correlation between the two variables above shows that no factor is controlled or held constant.

b)
Checking Multicollinearity Multicollinearity is when there is a correlation between independent variables in a model.Based on Table 4, there is no multicollinearity as the value of VIF scores are below 10 and the tolerance scores are above 0.1.Therefore, there is no absence of multicollinearity among the independent variables.

c)
The Values of the Residual are Independent The Durbin-Watson statistic in Table 2 shows that this assumption had been met, as the obtained value is close to 2, which is 1.562The values of Residuals are normally distributed The P-P plot (Figure 2) for the model suggested that the assumption of normality of the residuals may have been violated.However, as only extreme deviations from normality are likely to significantly impact the findings, the result was probably still valid.

Evaluate the Model
Evaluating the estimated model is a necessary but often overlooked procedure.However, it is a necessary prerequisite before the estimated model can be claimed as the best model and used to forecast values.The subsections that follow describe some of the most common statistical testing procedures.

a)
Fitness of the Model This is a test for the overall fitness of the model.The examination will reveal whether all or part of the independent variables should remain in the model.The test criterion used is the F-test statistic.The null hypothesis to be tested states that all coefficients in the model are equal zero, that is 0 H : The regression model is not significant 1 H : The regression model is significant The overall F-test can be found in the ANOVA table in the statistical output.To interpret the F-test of significance, the p-value for the F-test must be compared to a 5% significance level.If the p-value is less than the significance level chosen, the data provide sufficient evidence to conclude that the independent variables in the model improve the fit.This means that none of the independent variables provides any information for fitting the model, and hence the model is rejected.From Table 3, the p-value 0.000 is less than the significance level of 0.05.The data provide sufficient evidence to conclude that the regression model is significant.

b)
Goodness of Fit The standard measure of the goodness of fit is the coefficient of determination, 2 R .From Table 2, the coefficient of determination, 2 R shows that 87.8% of the total variation in overall marks can be explained by gender, quiz, final test, project, test1, assignment, test2 while the others 12.2% are cause by errors.Therefore, the is quite good (Dhakal, 2018).

Estimated Model Coefficient
From Table 4, the estimated model coefficient is:

Figure
Figure 1: P-P plot

Table 2
Model Summary