Traffic Flow Optimization: Modeling the Inter-Arrival Times for Simulation Model

With the growing number of vehicles on the road, traffic flow problems are no longer a local issue; instead, traffic flow optimization has drawn significant interest from researchers all over the world. Studies of discrete-event simulation have been widely used to encounter problems related to traffic flow. Researchers of discrete-event simulation modeling typically tend to use statistical distributions for inter-arrival and process times based on the simulation software's built-in tools. The software tools include Input Analyzer in Arena , Stat::Fit for Promodel , and ExpertFit for FlexSim. However, there are other numerical metrics and concerns that researchers should examine while deciding on the best distribution. This research explores the exponential distribution and compares it to distributions generated by software, then to real data. The square error value is the focus of the comparison. There were 5404 data points collected for the vehicle arriving at six lanes at selected traffic junctions. According to the findings of this study, the commonly used exponential distribution can be utilized to depict the distribution of inter-arrival times as there is no significant difference from the more complex distribution. In future study, researcher can comfortably use exponential distribution instead of using complex distribution.


Introduction
With rising consumer awareness of environmental issues, efforts to reduce environmental pollution have received attention from various circles. Hence, it is not surprising that many studies have been conducted to optimize operations in various fields. In this study, the area of focus is on traffic flow. A good management of traffic network might reduce bottlenecks and ease road traffic congestion. Environmental pollution due to traffic congestion is something that needs to be avoided.
There are many types of modern optimization approaches, but simulation is widely used and seems reliable to cater traffic management problem. Generally, in the traffic engineering and queuing analysis field, the randomness of vehicle time between arrivals is considered exponentially distributed (Adam, 1950;El-Hadidy et al., 2021;Jose, 2021;Kumar, 2022;Meng et al., 2009;Saritha et al., 2022;Sumaryo et al., 2015;Wang et al., 2021). Sumaryo et al (2015) used queuing models M/M/1 and M/G/1 to model traffic flow circumstances. Both models assume that the arrival rate is a Poisson process and that the inter-arrival times are distributed exponentially. This distribution is chosen and assumed based on specific criteria: the number of vehicles on the road is very high, the influence of a single vehicle on system performance is relatively minimal, and all vehicles make their own decisions about whether or not to use the road.
Researchers in discrete-event simulation modelling, on the other hand, tend to use the distributions for inter-arrival and process times recommended by the built-in tools (Frough et al., 2019;El-Hadidy et al., 2021;Jose, 2021;Kumar, 2022;Meng et al., 2009;Saritha et al., 2022;Sumaryo et al., 2015;Wang et al., 2021). Arena Input Analyzer, for example, one of the most widely used tools, provides the best distribution based on the smallest mean square error value. However, Kelton et al (2015) note that there are additional numerical measures and matters that researchers may take into account when deciding on the best distribution. The other two methods for evaluating how well a distribution fits the data are Kolmogorov-Smirnov (K-S) and Chi-square goodness-of-fit tests. The larger the p-value, the better the fit quality is implied by the tests.

Data Collection
The data for this study came from a large public traffic light junction on the AH2 highway, which connects Malaysia and Thailand. In this paper, we use the input data from the (Jalal et al., 2017Jalal et al., 2018. Data was gathered through observation and interviews. Traffic light junction operation is a continuous process with variable traffic flow. Traffic may be congested for some time due to the large number of vehicles on the road, but it may be smooth for other parts of the day. Several observations were made to obtain information on four types of data: traffic light control (TLC) cycle pattern, TLC cycle time, vehicle arrival at each junction, and model conceptualization. In addition, interviews with various teams involved in TLC operations have taken place. A few Public Works Department (PWD) employees shared important information about TLC operations, which helped to improve understanding of TLC operations. An interview was also conducted with road users who are frequently involved in road traffic congestion (RTC) on the Changloon main road. From there, useful information was gathered, which formed a solid foundation for this study. Figure 1 shows the snapshot of Malaysia border area with Thailand at Kedah while Figure  2 depicts the main road layout Changloon.

Input Modeling
Specific input information, such as the arrival rate, is needed when creating a simulation model in Arena. Input Analyzer, an Arena built-in feature, was used to gather and analyse the interarrival times between vehicles for this study. A probabilistic expression is given by the input analyzer and used to build the model. Figure 3 displays the results of an exponential and suggested distribution for the inter-arrival time at junctions for the first intersection, Lane A. The software recommended the lognormal distribution because its square error value, 0.002434, is lower than that of other distributions. In contrast, the square error value for an exponential distribution is 0.010476. With a mean of 4.29, the suggested exponential distribution for Lane D suggests that this distribution is common and frequently used for inter-arrival time. The results of the Lane D inter-arrival time are shown in Figure 4. The output for the inter-arrival time at the junction for Lane E using the exponential and suggested distribution is shown in Figure 5. The software recommended the Weibull distribution because its square error value, 0.002611, is lower than that of other distributions. In contrast, the square error value for an exponential distribution is 0.003752. Figure 5: Exponential and suggested distributions for Lane E. Figure 6 depicts the output obtained using an exponential and suggested distribution for the inter-arrival time at the Lane J junction. The software recommended the lognormal distribution because its square error value, 0.000944, is lower than that of other distributions. In contrast, the square error for an exponential distribution is equal to 0.010112. The proposed distribution for Lane K is exponential, with a mean of 10.6, indicating that exponential is a common and widely used distribution for inter-arrival time. The output of Lane K inter-arrival time is shown in Figure 7. Finally, using an exponential and suggested distribution, Figure 8 displays the output for interarrival time for Lane L. The software recommended this kind of distribution because the square error value for a lognormal distribution, 0. 001481, is smaller than that for other distributions. In contrast, the square error value for an exponential distribution is 0. 003207.

Model Development
Arena simulation software was used to create the model. The distributions of the inter-arrival time, rather than the service or process time, are the focus of this study. The process times for all traffic lights at the junctions are all deterministic and do not require any input distribution analysis. The process times or durations of the green lights are 120 seconds (Lane A), 30 seconds (Lane D), 35 seconds (Lane E), 60 seconds (Lane J), 50 seconds (Lane K), and 45 seconds (Lane K) (Lane L). Figure 9 depicts an Arena modelling layout at the Lane L junction. The modelling process began with a Create module that represented the vehicle's arrival at the intersection. The vehicle type was assigned using an Assign module. The entity was then moved to a decide module, and eighty percent of the entities were distributed to hold module "Lane L," while the remaining percentage went to enter junction two. The Hold module waits for the departure signal, and only one entity is permitted to depart once the signal is received. The entity was then moved to a decide module and distributed to the exit points as needed. Jalal et al. (2017) provides a detailed explanation of model development.

Results
The distributions are generated from thousands of input data points collected from two traffic light junctions in Changloon. Saritha et al. (2022) contains a detailed description of the junctions.

Input Data
There were 5404 data points collected for vehicles arriving between 5.00 p.m and 7.00 p.m. at all six junctions. As shown in Table 1, these data can be further subdivided into each junction. The arrival rate for vehicles is commonly assumed to be exponentially distributed in traffic engineering and queuing analysis. Comparisons were made in this study using six data sets, one for each lane.

Model Validity
Building the right model is a key component of model validation. It's crucial to stress that no model is ever fully validated or verified. Any simulation model is a simplified representation of a real system, and the behaviour of the model is only a rough approximation of the behaviour of the system (Carson, 2002;Desa et al. 2015). A model is said to have been verified or validated when the modeller has explicitly completed a number of tasks to verify and validate the model to the extent required for his or her purposes. Such approval is almost always largely a matter of judgement. Model validation and verification increase the decision maker's confidence in the model, which is referred to as model credibility.
The output for particular parameters was taken into consideration, and it was thought to be compared with the actual value from the data gathered. We applied the validity level used in earlier studies by Hashim et al. (2003) in this study, which stated that the output should be within 10% of actual data. The following formula can be used to determine the validity level: As shown in Table 2, the validity levels for all lanes were lower than 10% which indicates that both exponential and software suggested distributions are valid to be used in the model. Even, with exponential distribution, simulated results give lower differences or better validity levels for Lanes A and J. For Lanes D and K, the results are the same since the software suggested exponential as the best distribution

Contribution of Study
Developing a simulation model is a process to imitate the real-world situations. A good simulation model is known to have a small difference between simulated value and actual value, which lead to high validity level. A built-in feature of Arena software called Input Analyzer is helpful in the effort to create a good distribution for arrival time because it can provide the best fit distribution and is not just limited to exponential distributions. From the findings, either exponential or not, the results have no significant differences. Instead of using a complex distribution, the developer can easily design the model using a simpler exponential distribution.
The outcomes are expected to provide the guidelines and perhaps, able to give awareness to the respective authorities which are responsible for handling the simulation study. It is hoped that with the proposed solutions, many parties can benefit.

Conclusion
This paper compares several statistical distributions based on real-world input data of time between arrivals or inter-arrival time. There are no significant differences between the suggested distribution and the findings, according to the results. In this study, we conclude that a simple and well-known distribution, such as the exponential distribution, can be used to build a valid simulation model instead of other complicated distributions.