A Comparison between ARIMA and Fuzzy Time Series Methods in Predicting Daily COVID-19 Outbreak in Malaysia

COVID-19 is a viral infection caused by a recently identified coronavirus that has impacted the lives of millions of people worldwide. In Malaysia, the number of COVID-19 cases has been increasing since 2021. This study aims to find the best model for forecasting the number of new confirmed cases of COVID-19 in Malaysia by comparing Autoregressive Integrated Moving Average (ARIMA) and Fuzzy Time Series models. ARIMA is commonly used for time-series analysis, forecasting, and control, while Fuzzy Time Series provides an alternative approach for predicting COVID-19 outbreaks. The error measures used to compare the models include Mean Square Error, Root Mean Square Error, and Mean Absolute Percentage Error. The study's results demonstrate that the Fuzzy Time Series model has the smallest error measure values compared to ARIMA, indicating that it is more accurate.


Introduction
In early 2020, the world was shocked by the emergence of COVID-19, a deadly infectious disease that affected millions of people globally.No one could have imagined that a virus or disease could have such a profound impact on the world.The World Health Organization (WHO) defines COVID-19 as an infectious disease caused by a newly discovered coronavirus (WHO, 2020).Common symptoms of COVID-19 include fever, dry cough, body aches, nasal congestion, runny nose, sore throat, diarrhea, and vomiting (D.Wang et al., 2020).study found that the Fuzzy Time Series model can be an alternative way to forecast COVID-19 sepsis.Fuzzy Time Series is a new concept proposed by the authors to deal with forecasting problems where the historical data consists of linguistic values.The concept of the fuzzy time series was first developed by Song and Chisson, as noted by Verma et al. (2020).The main advantage of the Fuzzy Time Series technique is that no assumptions are made regarding the data set.According to Cai et al. (2013), Fuzzy Time Series is an application of fuzzy mathematics in the field of time series.If the data is complete and contains noise, using fuzzy theory to help forecast can generally obtain better results.The dataset used in this study was the TAIEX stock market.The results showed that a hybrid model FTSGA, based on fuzzy time series and genetic algorithm, improved the performance and accuracy by applying genetic algorithm to the operations.Furthermore, Alam et al. (2022) proposed a new procedure for forecasting time series data based on the intuitionistic fuzzy set (IFS).The proposed model also utilizes a defuzzification formula that fully utilizes the main properties of IFS, which include the membership and nonmembership functions.The defuzzification procedure used in this study significantly decreased forecasting error.Even with the volatility of the Crude Palm Oil (CPO) price,this model produced noteworthy forecasting results.Another study applied the weighted fuzzy time series model to forecast epidemic injuries and found that it provides significantly better results than the classical statistical methods (Moneim, 2020).The study used COVID-19 epidemic injury data in Saudi Arabia from March 2nd, 2020 to July 20th, 2020.The results showed that the mean square error of the Weighted Fuzzy Time Series (WFTS) method was 0.0049 which was smaller than the previous methods.This effectively demonstrated that WFTS method is a good tool for predicting COVID-19 epidemic injuries.As one of the successful countries in handling the COVID-19 pandemic globally, Malaysia has needs to ensure that the number of new confirmed cases of COVID-19 always decreases and that the number of COVID-19 recovered cases increases to avoid a worsening situation.However, in 2021, the number of new confirmed cases recorded in Malaysia has been increasing day by day.This may lead toa lockdown for the whole country and if the daily cases continue to rise, the government will have to implement a total lockdown, in which would involve closing the entire sector, and people would not be allowed to go outside.These problems will affect the government, especially front liners, people and the economy of Malaysia.Therefore, a precise forecasting model is needed to assist the government in predicting the future value of the new confirmed and recovered cases of COVID-19.The forecasting model will also help the government make a good plan to prevent losses in handling the COVID-19 pandemic.
Hence, this study focuses on comparison between Autoregressive Integrated Moving Average (ARIMA) and Fuzzy Time Series for predicting new cases of COVID-19 in Malaysia.These time series methods aim to accurately predict the forecasted data by minimizing error values.

Data Collection Method
The data used for this analysis comprises the daily data on newly confirmed cases of COVID-19 in Malaysia from July 2021 until December 2021.Both Autoregressive Integrated Moving Average (ARIMA) and Fuzzy Time Series techniques were used to compare the data and identify the best technique.RStudio was used for ARIMA while Microsoft Excel was used for Fuzzy Time Series.The techniques were compared based on their Mean Square Error (MSE), Root Mean Square Error (RMSE) and Mean Absolute Percent Error (MAPE) performances to determine which technique is superior.

ARIMA Model
ARIMA models are widely used techniques for time series forecasting (Verma et al., 2020).This model has been identified as one of the most effective methods to predict time-series data.The ARIMA method is divided into three phases, which are identification, estimation and forecasting phases.The actual data should be stationary.If the data is not stationary, proceed to the identification stage.Evaluate the stationary data series with three approaches, which are ACF, PACF and ADF test.Difference the data when it is still non-stationary.Next, proceed with the estimation phase by checking the parameters estimated for the ARIMA model.The ARIMA model is categorized into three terms, p, d, and q where p is the order of the autoregressive (AR) term, q is the order of the moving average (MA) term, and d is the number of differencing used to obtain the stationary time series data.Autoregressive (AR), p term, Moving Average (MA), q term, 1 1 2 2 .....
Where t  represents white noise.

Number of differencing, d term,
( ) where B is the lag operator.
Therefore, the general formula for the ARIMA model is as follows, Where, ' t y is a differenced series.
In this study, ARIMA (2,1,8) was computed for the ARIMA model from the result generated using R-studio software.The general equation can be expressed using equation 4.

Fuzzy Time Series
Song and Chisson first developed the concept of the fuzzy time series (Verma et al., 2020).There were seven steps or algorithms presented in the Fuzzy Time Series model.
Step 1: All the data were analyzed and then changed it into a percentage form.The formula shown below is a calculation of the percentage form.Step 2: Define the universe, U : Two minimum and maximum values have been identified from the percentage form in step 1.The maximum value is 29.22% while the minimum value is -24.56%.The universe of discourse, U was needed by finding the minimum enrollment Dmin and the maximum enrollment Dmax.The formula universe of discourse U is defined by: Step 3: The fuzzy sets   were constructed using the same length of intervals where it starts from 1 until 7.In this step, the fuzzification of interval and frequency of each interval were identified.The formula shown below is a calculation of the length of interval for fuzzification.

( ) ( )
Step 4: The interval of  1 ,  2 , … ,   , were generated based on step 2 and the interval were formed in trapezoidal number.
Step 5: Each of the data was listed and classified in percentage forms and based on the interval in step 4. Thus, the fuzzy logical relationship was identified from the classified data.Fuzzy logical relation was represented as   →   where   is presented in form and   is the future form.
Step 6: A fuzzy logical relationship rule is required to identify and is based on the fuzzy logical relations from step 5.The fuzzy logical relationship rule must be generated in groups.
Step 7: Each fuzzy logical relationship group needs to be classified into one of the three different types of rules.Every calculation of fuzzy logical relationship was different based on the rules below.
Rule 1: The fuzzy group of   is null or empty,   → Ø or same,   →   .Rule 2: The fuzzy group of   is a one-to-one relationship for example   →   .
Rule 3: The fuzzy group of   is a one-to-many relationship, for example   →   ,   .

Evaluate Model Performance
From the generated models, the goodness of model in which the measurement of errors for accuracy of the forecasted data was obtained from the actual data and the ARIMA and the Fuzzy Time Series technique were compared to the lowest error measurements.

Findings and Discussions
Figure 1 shows the actual data on the daily cases of COVID-19 in Malaysia from July 2021 to December 2021 with a clear trend present.The highest number of new COVID-19 cases was 24599 on day 57, which is occurred on 26 th August 2021.In contrast, the lowest number of reported COVID-19 cases from the data collected was 2589 cases on day 173, which is occurred on 20 th December 2021.COVID -19 cases have shown a trend where there was an increase in the number of cases in August 2021.However, the number of COVID-19 cases started to drop from day 65.After day 128, the graph showed that the COVID-19 cases dropped to under 5000 cases and continued to decline.This dataset has a total of 184 data (N:184) from 1 st July 2021 until 31th December 2021.The accuracy of the forecasted data was measured by comparing the generated models with the actual data.The ARIMA and fuzzy time series forecasting models were compared based on the lowest error measurements.Three types of error measurements were used in this study, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).In Table 1 number of new cases of COVID-19, 1 n y − is the number of new cases of COVID-19 before.
it is calculated by subtracting the forecast value t F from the series' actual value; t y .Here n is the number of effective observations used to match the model.Minimum values of these accuracy measures provide the best fitting models.The lowest measurement of these error values shows and compares which methods are best suited to the model.

Figure 1 :
Figure 1: Daily New Confirmed Cases COVID-19 in Malaysia from July 2021 to December 2021 Hadjira et al. (2021)es model the lowest MSE, RMSE and MAPE measurement errors indicating higher accuracy in forecasting.The MSE of fuzzy time series and ARIMA (2,1,8) are 611279.3557and5167665.563respectively.The RMSE of fuzzy time series is 781.8436 which is lower than ARIMA (2,1,8), 2273.250.The last measurements error evaluated is MAPE, fuzzy time series also has the lowest error measurement with the value of 4.3169 compared to ARIMA (2,1,8) with a value 53.629021.Therefore, the fuzzy time series is the best model chosen for predicting the number of new COVID-19 cases in Malaysia since it has lowest measurement errors.This study aimed to determine the optimal model for predicting COVID-19 case values in Malaysia.The evaluation was based on the minimum measurement errors of the model.Two methods, ARIMA and fuzzy time series, were used to analyze the actual COVID-19 cases data in Malaysia.The findings suggest that the fuzzy time series model was the best model for predicting the number of new COVID-19 cases in Malaysia, as it had the lowest measurement errors.This study's results can help the government in predicting future values of COVID-19 cases and assist in making informed decisions to combat the pandemic's spread.Analysis of this study showed that the fuzzy time series is the best method since this model generated the smallest measurement error values compared to ARIMA (2,1,8).The graph of actual data of new COVID-19 cases and estimated COVID-19 cases from July 2021 to December 2021, showed that fuzzy time series method fits the actual data well.These results are consistent with previous studies by C. C.Wang (2011)who stated that fuzzy time series techniques behave more predictably than ARIMA time series techniques.In addition,Hadjira et al. (2021)revealed that fuzzy time series model is the best model in predicting COVID-19 outbreaks compared to ARIMA models using statistical criteria.For future studies, it is recommended to use more time series forecasting models and present the predictive values.ReferencesAlam, N. M. F. H. N. B., Ramli, N., & Nassir, A. A. (2022).Predicting Malaysian crude palm oil prices using Intuitionistic Fuzzy Time Series Forecasting Model.ESTEEM Academic Journal, 18(March), 61-70.Bernama.(2020)."First case of Malaysian positive for coronavirus", Bernama.Available: https://www.bernama.com/en/general/news_covid-19.php?id=1811373 Babulal V., and Othman N.Z.(2020, April 10)."Sri Petaling Tabligh gathering remains Msia's largest COVID-19 cluster", New Straits Times.Available: https://www.nst.com.my/news/nation/2020/04/583127/sri-petaling-tabligh-gatheringremains-msias-largest-covid-19-clusterCai, Q. Sen, Zhang, D., Wu, B., & Leung, S. C. H. (2013).A novel stock forecasting model based on fuzzy time series and genetic algorithm.Procedia Computer Science, 18, 1155-1162.