A Hybrid Model Based on EMD-Feature Selection and Random Forest Method for Medical Data Forecasting

Hospital managers need to allocate emergency department (ED) resources efficiently because of the gradual aging of the population, emergency services overcrowding problem is arising. Forecasting is a vital activity that instructs decision-makers in related research fields, such as industrial scientific planning, economics, and healthcare. Scientists have applied time series methods to daily patient number forecasting at ED. Traditional time series models usually use a single variable for forecasting, but noises caused by weather conditions change and environmental factors would be included in raw data. Low forecasting performance would be generated because of using complicated raw data in time series models. Further, traditional time series models cannot be utilized in all datasets because statistics models need to meet statistical assumptions. Multi-attribute data will usually produce high-dimensional data and increase the computational complexity in the data mining procedure. For overcoming these drawbacks above, this study proposes a hybrid random-forest model based on AR (autoregressive) and empirical mode decomposition (EMD). The proposed model utilizes EMD to decompose complicated raw data into correlations frequency components and uses the feature section method to reduce high-dimensional input data generated by EMD. Then, this study combines random forest method that can surmount the limitations of statistical methods (data need to obey some mathematical distribution) to forecast daily patient volumes. To verification, daily patient volumes in an emergency are collected as experimental datasets to evaluate the proposed model. Experimental results illustrate that the proposed model surpasses the listing models.


Introduction
The emergency department (ED) is an essential part of the hospital. Relying on professional practicians and the high technological equipment, ED can offer immediate medical care. The emergency services overcrowding problem is arising because of the gradual aging of the population. It would result in an overcrowding problem, if a hospital cannot make an efficient allocation of ED resource. ED overcrowding must affect not only patient satisfaction but also the quality of treatment and prognosis. The pivotal step for allocating ED resources is how to predict the demand for ED. The results generated by forecasting model medical would guide decision-making in many tasks such as, staff supplementation, expansions of beds. In recent years, several researchers have concerned demand forecasting for staff allocation and 242 resources in hospitals (Asplin et al., 2006, Hwang and Lee, 2008, Schweigler et al., 2009, Georgio et al., 2017. Most scholars firstly consider traditional time series models as prediction models to forecast medical resource demand and several time-series models have been proposed and applied to handle the different forecasting areas (Bollerslev, 1986, Engle, 1982, Huarng, 2001, Song and Chissom, 1993, Wei et al., 2017, Hamida and Scalera, 2019. Engle (1982) proposed the ARCH (p) (Autoregressive Conditional Heteroscedasticity) model that has been used by several financial and economic analysts and the GARCH (Bollerslev, 1986) (Generalized ARCH) model is the generalized form of ARCH. Box and Jenkins (1976) proposed the autoregressive moving average (ARMA) model which combines a moving average process with a linear difference equation to obtain an autoregressive moving average model, and the ARMA model performs forecasting at the linear stationary condition. Models that describe such homogeneous nonstationary behavior can be obtained by supposing some suitable differences in the process to be stationary. Therefore, the autoregressive integrated moving average model (ARIMA) (Box and Jenkins, 1976) with the assumption of linearity among variables was proposed to handle the non-stationary behavior datasets. Besides, linguistic expressions are often used to describe daily observations. Hence, Song and Chissom (1993) first proposed the original model of the fuzzy time-series and the following researcher, Chen (1996) proposed refined fuzzy time series model for enrollments forecasting. In focusing on establishing fuzzy relationships of the fuzzy time series model, Yu (2005) recommend that different weights should be set in various fuzzy relationships and proposed a weighted fuzzy time-series method to forecasting stock index. From the literature above, AR (autoregressive) is a fundamental and important method in time series models. The application of conventional time series models needs to meet statistical assumptions and not all models can be applied in all datasets (Wei, 2013). Besides, most of the traditional time series models use a single variable for forecasting. However, there are many noises involute in raw data that are caused by changes in surrounding conditions for patient volumes forecasting. The conventional time-series models which use complicated raw data would reduce the forecasting performance (Wei, 2016).
Today, information technology plays an important role in managing knowledge in the healthcare environment (Turan andPalvia, 2014, Cheng, 2012). However, there are still challenges in how to use advanced technology to create and disseminate healthcare knowledge. Information technology remains a key tool in healthcare management applications. Information technology is also used to anticipate healthcare needs, and as the demand for emergency medical services continues to increase, the use of information technology for forecasting is becoming increasingly important. Data mining technology is a kind of information technology. It is a rapidly developing technology in information processing applications. It has attracted much attention in the field of knowledge discovery. The knowledge discovery process consists of data collection, data selection, pattern recognition, and knowledge representation. Data mining consists of techniques above and has been applied to various disciplines such as business and medical data prediction (Aneeshkumar and Venkateswaran, 2015, Arslan et al., 2016, Wei, 2014, Izadi andTaghva, 2017). Kim and Han (2000) proposed a genetic algorithm approach to feature discretization and the determination of connection weights for artificial neural networks (ANNs) to predict the stock price index. Roh (2007) integrated neural network and time-series model for forecasting the volatility of the stock price index. Jones et al. (2008), use artificial neural networks (ANNs) to forecast daily patient volumes in the emergency department. ANNs are promising forecasting methods and have been applied extensively in many domains. However, they would suffer from network construction problems and the need for large training datasets.
Besides, there are other artificial intelligence prediction methods, such as the Random Forest algorithm (RF). Random Forest is a supervised machine learning algorithm, which is a combined classification method based on statistical learning theory (Breiman, 2001). In a random forest, pluralities of random variable samples were selected as training data sets using a bagging procedure. The bagging process refers to random sampling and replacement, which is used to reduce the predicted variation and help to prevent over-fitting. A tree classifier corresponding to the selected sample is then constructed during the data training process. Decision tree classifiers and regression tree classifiers in RF are often used in tree-based classification methods. Finally, RF would combine all the classification trees by voting on each classification result and then select the final classification result based on the number of votes. Random forest method can process nonlinear data and obtained outstanding forecasting results in past researches.

243
To enhance prediction performances, except single forecasting algorithms hybrid models are often utilized and models that use empirical mode decomposition (EMD) have gained great attention (Chen et al., 2012). EMD (Huang et al., 1998) is a useful method to deal with non-linear signal analysis (such as stock data) or other related fields (Vincent et al., 1999 and offers a new way to deal with nonlinear and non-stationary signals. EMD-based prediction methods have been used on wind speed prediction , Ren et al., 2014, industry (Feng et al., 2010), tourism management (Lai and Yeh, 2013), and financial time series forecasting (Fu, 2010). Based on EMD, any complicated signal can be decomposed into a finite and often small number of intrinsic mode functions (IMFs), which have simpler frequency components and stronger correlations, thus are easier and more accurate to forecast. In recent years, feature selection methods have been applied to forecasting models (Wei and Cheng 2012). The feature selection process is for evaluating features, selecting relevant features and removing redundant and/or unrelated features. Three important main advantages of feather selection are (1) model simplification, (2) easy to interpret, and (3) faster model induction and structural knowledge. The forecasting model combined with feather selection will improve prediction performance and prediction accuracy.
From the mentioned above, there are some major drawbacks in those models: (1) some traditional time series models cannot be applied to the datasets that do not follow the statistical assumptions; (2) most conventional time-series models utilize late-day data with noises as an input variable in forecasting. However, there are noises including in raw data that are generated by environment and weather conditions. To overcome the drawbacks above, this paper considers that EMD can decompose the complicated raw data into simpler frequency components and highly correlations variables. Then, this study utilizes RF as a forecasting model that can overcome the limitations of statistical methods (data need to obey some mathematical distribution). Based on the abovementioned concepts, the proposed model firstly tests the lag of the AR model. Secondly, the input variables of AR are decomposed by EMD into several IMFs and a residue. However, the IMF attribute set generated from the EMD decomposition will generate high-dimensional data and will increase the computational complexity. Therefore, this paper will utilize the feature selection method and take the advantages of feature selection to solve the problems that will arise from multi-feature data. In this study, the method of feature selection is used to reduce the IMF feature set generated by EMD decomposition. Finally, the random forest is combined with the AR method and the reduced IMF attribute set for modeling and prediction. Therefore, the proposed model can be expected to produce more accurate prediction results for solving the emergency department patient's overcrowding problem.

Methodology of Research
This section reviews related methodologies of the autoregressive model correlation-based feature selection, empirical mode decomposition and random forest algorithm.

Autoregressive Model
In the time series forecast, predictions are practically obtained by forecasting a value at the next period based on a specific prediction algorithm. Besides, forecasting non-periodic short-term time series is much more difficult than that for long-term time series. The autoregressive moving-average (ARMA) is a traditional method that is very suitable for forecasting regular periodic data such as seasonal or cyclical time series (Chang, 2008).
Box and Jenkins (1976) developed a general linear stochastic model by assuming that time-series data can be generated by a linear aggregation of random shocks. In this study, we focus on AR model, which is a model includes one or more past values of the dependent variable among its explanatory variables, and a simplest AR(1) is defined as: When the random error and the constant term are taken into account, the modified AR(1) model becomes Where Φ1is the first-order autoregression coefficient and ut is the white noise viewed as a random error. An autoregressive model is simply a linear regression of the current value of the series against one or more prior values of the series. In the AR(1) model, it can be thought like that for a given value y in period t that has a relationship with period t1. If there is an autoregressive model of order p, an AR(p) model can be expressed as

Correlation-based Feature Selection (CFS)
Correlation-based Feature Selection (CFS) is proposed by Hall (1998). A central problem in machine learning is to identify a set of representative features from data to builds a classification model for a specific task. The CFS method mainly expects to use the correlation-based method for dealing with feature selection problems. The major concept is that a good feature set contains highly features related to decision feature, but is not related to each other. Based on the proposition that the feature evaluation formula comes from the test theory, an operational definition of this hypothesis is provided. CFS (Based on relevance attribute filtering) is an algorithm that couples this evaluation formula with appropriate relevance measures and heuristic search strategies. The CFS method has a function that can quickly identify and screen irrelevant and redundant features. In practical applications, CFS usually eliminates more than half of the input attributes that are independent of the output attributes. In most cases, the forecasting accuracy of the classifier which utilizes the attributes filtered by the CFS method is higher than that produced by using a prediction model that takes all attributes as input variables. In general, the CFS method outperforms the wrapper attribute filtering method on small data sets. CFS executes many times faster than the wrapper, which allows it to scale to larger data sets.

Random Forest Algorithm
Random forest algorithm (RF) (Breiman, 2001) is to build a forest randomly. The forest is composed of many decision trees (DT). There is no correlation between each decision tree in the random forest. After obtaining the algorithm structure of the entire forest, if new data must be classified, the attribute value of this data is input to each decision tree in the forest, and each decision tree will return a classification value for the category voting. The classification value predicted by the RF is the result of the most voting among all DT-related categorical variables in the forest. Each DT has the following characteristics: 1. If N is the number of instances in the data set, RF chooses a random sample and replaces N instances from the original data. This sample will be used as the training set for the forest.
2. If M is the number of features in the dataset, assign a value m << M. During the forest construction process, this value of m remains unchanged.
3. On each node of the tree: 3.1 Randomly select m attributes from original M features. 3.2 The segmentation criterion is calculated based on these features. Features with classification ability will be used to split node. There is no pruning process after RF builds each decision tree. In the original random forest paper, Breiman (2001) used two RF methods: Random Forests Using Random Input Selection: The simplest random forest with random features is formed by selecting at random, at each node, a small group of input variables to split on. Grow the tree using classification and regression tree (CART) methodology to maximum size and do not prune. Denote this procedure by Forest-RI. The size F of the group is fixed. Two values of F were tried. The first used only one randomly selected variable, i.e., F=1. The second took F to be the first integer less than , where M is the number of inputs.

Random Forests Using Linear Combinations of Inputs:
If there are only a few inputs, say M, taking F an appreciable fraction of M might lead to an increase in strength but higher correlation. Another approach consists of defining more features by taking random linear combinations of a number of the input variables. That is, a feature is generated by specifying L, the number of variables to be combined. At a given node, L variables are randomly selected and added together with coefficients that are uniform random numbers on [-1, 1]. F linear combinations are generated, and then a search is made over these for the best split. This procedure is called Forest-RC.
The method chosen in this paper is random forests using random input selection, which is the most common method. The number of features selected for segmentation is the first integer value to be determined, and its value is less than +1, where M is the number of input variables (attributes). It also uses Information Gain as the segmentation criterion. The Information Gain algorithm relies on the socalled "Entropy". Its formula is: Entropy = (4) p: the probability of success (or probability of true) q: the probability of failure (or probability of false).

Empirical Mode Decomposition
The empirical mode decomposition (EMD) technique, proposed by Huang et al. Empirical mode decomposition (Huang et al., 1998), is a form of adaptive time series decomposition technique using the Hilbert-Huang transform (HHT) for nonlinear and non-stationary time series data. The basic principle of EMD is to decompose a time series into a sum of oscillatory functions, namely, intrinsic mode functions (IMFs). In the EMD, the IMFs must satisfy two conditions: (1) the number of extreme (sum of maxima and minima) and the number of zero-crossing differs only by one, and (2) the local average is zero. The condition that the local average is zero implies that envelope means of the upper envelope and lower envelope is equal to zero. The first condition is similar to the traditional narrowband requirements for a stationary Gaussian process (Huang et al., 1998). The second condition modifies classical global requirement to a local one; it is necessary so that the instantaneous frequency will not have the unwanted fluctuations induced by asymmetric waveforms (Huang et al., 1998). The detailed algorithm for EMD is shown as follows (Huang et al., 1998): Step 1: Identify local extreme in the experimental data { x ( t )}. All the local maxima are connected by a cubic spline line () Ut , which forms the upper envelope of the data. Repeat the same procedure for the local minima to produce the lower envelope () Lt . Both envelopes will cover all the data between them. The mean of the upper envelope and lower envelope 1 () mt is given by: Subtracting the running mean 1 () mt from the original time series () xt , we get the first The resulting component 1 () ht is an IMF if it is symmetric and has all maxima positive and all minima negative. An additional condition of intermittence can be imposed here to sift out waveforms with a certain range of intermittence for physical consideration. If 1 () ht it is not an IMF, the sifting process has to be repeated as many times as it is required to reduce the extracted signal to an IMF. In the subsequent sifting process steps, 1 () ht is treated as the data to repeat steps mentioned above, 246 Again, if the function 11 () ht does not yet satisfy criteria for IMF, the sifting process continues up to k times until some acceptable tolerance is reached: Step 2: If the resulting time series is an IMF, it is designated as The first IMF is then subtracted from the original data, and the difference 1 r given by is the residue. The residue 1 () rt is taken as if it were the original data, and we apply to it again the sifting process of Step 1.
Following the above procedures, we continue the process to find more intrinsic modes ci until the last one. The final residue will be a constant or a monotonic function which represents the general trend of the time series. Finally, we obtain

Proposed Model
In this section we illustrate the datasets collected in this study. Next section shows the proposed algorithm of this study.

Data Source
This study collects datasets from the hospital information system of a regional hospital. Emergency department has three diverse divisions (internal medicine, surgical, pediatrics), and all daily patient volumes (all datasets) of the emergency department contain patients of three different divisions. Patients visit the division of internal medicine and division of surgical every day but patients visit the pediatrics division just in a few days. Thus, this study only extracts daily patient volumes in the internal medicine division and surgical as two sub-datasets. All patient volumes of ED are denoted as dataset I and patient volumes of internal medical division and surgical division are denoted as dataset II and dataset II, respectively. There are 731 observations from July 2010 to June 2012 in each experimental dataset, data from July 2010 to December 2011 is selected as training data and testing data is extracted from January 2012 to June 2012. The detailed information of these datasets is showed in Table 1.

Proposed Algorithm
Based on the research concepts in Section 1, this paper proposes a hybrid time series model which considers the EMD method, the AR model, and combines CFS to reduce IMFs generated by EMD decomposition. Further, the proposed model utilizes a random forest method to forecast daily patient volumes in an emergency. This study firstly tests the lag of AR by statistical analysis and then uses EMD to decompose input variables of AR, and generated IMFs set is reduced by the CFS method. Finally, the proposed model applies the random forest method to forecast the patient number. The overall process of the proposed model is shown in Figure 1. This section uses practically collected data (Dataset I) as the example step by step to show the core concept of the proposed algorithm as follows.
Step1: collect the data set All patient volumes of ED (Dataset I) from July 2010 to June 2012 are collected to demonstrate the proposed model. There are 549 training data (from July 2010 Step 2: Test the lag period E-Views software package is utilized to fit the AR model for orders and different lags of patient volumes (PV). There are five linear regression variables in dataset I, (i.e., from PV (t − 1) to PV (t − 5)) are selected to be estimated and tested. If the p-value is less than the significant level of 0.05, then reject the null hypothesis. Take Dataset I as an example, Figure 2 illustrates that the p-value (0.0000) for PV (t−1)) is less than the significant level of 0.05 among five variables, from PV (t -1) to PV (t −5). Further, the variable PV (t -1) is not equal to zero. Therefore, the order of AR is one. Step 3: Decompose input variables by EMD From step 2, the lag period test results shows that the order of AR is one. Therefore, the input variable (PV () t ) is decomposed by EMD into a finite set of IMFs (the residual 1 () n rt  also be considered as an IMF). There are ten IMFs and one residue generated from PV () t in Dataset I.
Step 4: Use CFS to reduce IMFs set In this step, this study uses the feature selection method (CFS method) to reduce the attributes of the IMF attribute set generated in the third step. Seven IMFs are selected by CFS method from 11 IMFs generated in the previous step.
Step 5: Build the forecasting model and train RF forecast model Then, this paper uses PV (t) and the 7 IMFs selected by the CFS in step 4 as the input attributes of the prediction model and uses PV (t + 1) (the next day's PV value) as the output variable. Then, this paper applies a random forest method to build a prediction model.
Step 6: Forecast testing datasets by the trained model The random forecast parameters of the forecasting models are determined when the stopping criterion is reached from step 5, then the training forecasting model is used to forecast PV(t+1), for the target testing datasets.
Step 7: Calculate RMSE and compare with the listing models Calculate RMSE values in testing datasets by Equation (11). Then the RMSE is taken as an evaluation criterion to compare with the listing models.  (11) Where actual(t) denotes the real PV value, forecast(t) denotes the predicting PV value and nis the number of data.
Step 8: Performance comparison RMSE values in testing datasets are calculated by equation (11). Then, the RMSE is taken as the evaluation criterion to compare with the listing models.

Experiments and Comparisons
In this section, the RMSE is taken as the evaluation criterion to evaluate forecasting accuracy. The Dataset I, II, III are used as the experimental datasets to verify the proposed model. Training Data are selected from July 2010 to December, and those January 2012 to June 2012 are selected for testing in each dataset. Further, this paper compares the forecasting accuracy of the proposed model with the traditional time-series model (AR (1) model (Engle, 1982)), fuzzy time series model (Chen's model (Chen, 1996), Yu's model (Yu, 2005)) and random forest (Breiman, 2001) model. AR-EMD-RF model (IMFs set without being reduced by feature selection) is also as comparison model. This study tests the lag period of PV in Dataset II and II and the order of AR for two datasets is all one. The number of decomposed IMFs in Dataset II and III are all 9. The number of IMFs selected by CFS method in Datasets II and III reduced are all 8.
The performances of the listing models above used to forecast PV are compared to the proposed model. The forecasting performances of the AR (1) model, Chen model, Yu model, random forest model, AR-EMD-RF model, and the proposed model are listed in Table 2. From Table 2, results show that the proposed model outperforms the other five models in each dataset. These PV forecasting performance evaluations illustrate the excellent performance of the proposed model.

Findings
A novel model, based on AR-EMD and feature selection method joining to fusion random forest algorithm, has been proposed to forecast patient volumes in Taiwan. Further, the proposed model is compared with five forecasting models, Chen's model Yu's model, AR(1) model, AR-EMD-RF model, and random forest model, to evaluate the performance of proposed model. After verification and comparison, 250 the proposed method surpasses the listing methods. There are three findings from the experimental results in this paper as follows: (1) The advantage of the hybrid model: According to Table 2, it is evident that the hybrid models (AR-EMD-RF model and proposed model) are superior to the single methods (Chen model, Yu model, AR model, random forest model) in condition of RMSE. The major reason is that the hybrid models take into account AR-EMD method with random forest learning for PV forecasting, integrating the advantage of random forest, which offer efficient estimates of the test error without incurring the cost of repeated model training associated with cross-validation.
(2) EMD superiority: From Table 2, it is shown that the performances of the proposed model and AR-EMD-RF are better than the random forest model. EMD methods would decompose the noise raw data into highly correlations and input variables simpler frequency components, and reduce prediction error more efficaciously.
(3) Reducing IMFs set for forecasting performance: Table 2 discloses that the proposed model with reducing IMFs set process performs better than the AR-EMD-RF model. This indicates that the feature selection process of the proposed model could improve forecasting capability.

Conclusions
In recent years, the number of emergency department patients has continued to increase, and the rapid changes in the supply and demand of emergency room resources have made the issue of how to effectively allocate emergency department resources more and more important. For the field of clinical research, scholars have proposed many models to predict the daily number of patients in the emergency department. However, there are still some shortcomings in the previous prediction models: (1) some data sets do not meet statistical assumptions, and some traditional time series models in past are not suitable to analyze these data sets; (2) in most traditional time series models, The latest period of data is used as the input variable of the prediction model to predict the value of the next day, but there is often noise in the input original data value.
To solve the deficiencies of the traditional prediction models in the past, this paper proposes a new hybrid model which combines AR and EMD model (EMD can decompose the original data which includes noise into IMFs having strong correlation with the output value), and the random forest algorithm can overcome problem that traditional statistical methods cannot be applied to datasets without obeying the statistical assumptions, and consider advantage that reducing IMFs attributes could improve the performance of the prediction model.
Experimental results show that the proposed model could reliably and accurately predict the number of patients admitted to the emergency department each day. The reason why the proposed model is better than other prediction models used for comparison in this research is that EMD is used as a preprocessor to remove noise from the original signal. IMF has a specialty which is a simple frequency component and a high correlation with the output value. Besides, the proposed model uses the method of feature selection to further reduce the IMF attribute set to enhance prediction performance.
Through this pre-processing mechanism, the proposed model not only promotes the simplification of the random forest modeling process but also has more accurate prediction performance than other prediction methods (based on the RMSE evaluation criteria). Therefore, this method is very suitable for analyzing nonlinear and noise including data and is an effective method for predicting the number of patient visits. Besides, the results of this paper are useful and feasible for policymakers and scholars in related research fields in the future. Hospital managers can use this predictive model to discover useful relevant knowledge in the field of medical research.
In subsequent related work, more patient visit data can be collected as an experimental data set to verify the stability of the proposed model. In the future, other traditional time series methods can be integrated into the new prediction model to improve prediction performance. Other feature selection methods can be combined with the forecasting model to enhance prediction performance. Further, researchers can also consider data mining methods that can generate rules to be used as a forecasting model, and it is expected that the new model can generate useful decision rules for hospital managers or related decision-makers.