Network Traffic Time Series Performance Analysis using Statistical Methods

a Faculty of Computer Science, Universitas Muslim Indonesia, Jl. Urip Sumoharjo KM5, Makassar 90231, Indonesia b Faculty of Computer Sci. and Information Tech, Mulawarman University, Jl. Kuaro no.1, Samarinda 75123, Indonesia c Faculty of Computing and Informatics, Universiti Malaysia Sabah, Jalan UMS, Kota Kinabalu 88400, Malaysia d Dept. of Information Tech., Samarinda State Polytechnic, Jl. DR. Ciptomangunkusumo, Samarinda 75131, Indonesia 1 purnawansyah@gmail.com; 2 haviluddin@gmail.com*; 3 ralfred121@gmail.com; 4 onnygaffar212@gmail.com *corresponding author


I. Introduction
The remarkable and high accuracy of forecasting result is indeed required to take a decision [1,2].In this paper, three statistical models i.e.Decomposition, Winter's exponential smoothing and autoregressive integrated moving average (ARIMA) were used to make forecasting on the use of daily internet traffic.In which, the data traffic constitutes a time series.Furthermore, time series comprises a series of observation pursuant to time.Employed time series, principally, for making forecasting is a data series of (yt+1, yt+2, ..., yt-n) in accordance with (xt+1, xt+2, ..., xt-n) in particular time range [2][3][4].
Then, the primary factor influencing forecasting technique determination relies on identification and approach to determine pattern data which basic notation of forecasting Yt: time series data value during the period of t, Ŷt: forecasting value of Yt and  =  −  : surplus or error in forecasting.Time series comprises of (1) trend (T); data characteristic tend to be high or low, (2) seasonal variation (S); periodical fluctuated data in a year such as monthly, weekly, and daily data, (3) cycles (C); fluctuated data in more than a year, (4) random component (R); data combination from seasonal variation, trends, cycles and random factor are required to be taken into account within forecasting method [5][6][7].
This present study aims at juxtaposing forecasting result using time series data in accordance with three statistical methods i.e.Decomposition, Winter's exponential smoothing and ARIMA.This paper consists of four different part.The first part deals with the issue on why the authors were intrigued on conducting such a study.Then, it is followed by the second part which exposes several related theories and technique on time series forecasting.The third part presents the results of the study and the fourth part discusses the results and draws a conclusion.

II. Methods
In making forecasting, numerous statistical methods and literature review are available and have been employed by several researchers.These statistical methods have been widely used in financial and demography aspects.In making forecasting, the employment of statistical methods are considerably influenced by time series pattern generated, hence, initial observation in making forecasting shall oversee and analyze the type of the data since every statistical method possesses different working phase [8].
Internet data traffic is characterized as a seasonal variation which fluctuates periodically.Thus, three statistical methods considered as the most applicable to make forecasting are decomposition, Winter's exponential smoothing and ARIMA [1][2][3].Below, the three methods employed in this present study is briefly explained

A. Decomposition
Decomposition method comprises of two models which include the additive and multiplicative model.Additive model constitutes by (1).
In which, Yt observation towards time t.
The basic principle of time series decomposition method is to disintegrate time series data in several patterns and identified those time series segregated, then discovered it separately.After, the data is discovered, integrate the data to make a forecasting.The disintegration of the data is conducted to improve the accuracy of forecasting and attain better time series data attitude comprehension [4,7].

B. Winter's exponential smoothing
Exponential smoothing method is the procedure for continuous improvement in forecasting the recent object observations.This method provides an average weighted exponential moving at the entire last observed values.Winter's exponential smoothing recognizes three constants as determinants of outcome data forecasting, it is composed of α as a smoothing constant,  as the trend component and d as  seasonal component, where the magnitude of a constant between 0 and 1.To generate accurate forecasting, it is determined some combination of values smoothing constant.
Winter's exponential smoothing method for forecasting time series comprises of two models i.e., multiplicative and additive.Multiplicative, principally, contains duplication between the trend component and the seasonal component and it is used when the data in a particular season proportional to the previous season.The formula used (3).
where b 1 is the permanent component; b2 is linear; trend component; St is the multiplicative seasonal factors; ε t is error component.
While the additive model containing the sum of the trend component with the seasonal component and is used if the difference data reaches a relatively constant in every season, (4).
where b 1 is the permanent component; b2 is a linear trend component; St is additive seasonal factors; ε t is the error component [1,2,4]

C. Autoregressive integrated moving average (ARIMA)
ARIMA method is used to analyse the time series consisting of autoregressive (AR) and moving average (MA).The methods of ARIMA (p, d, q) (P, D, Q) s is used with the provisions of the time series which is stationary, where p is the process in AR, d is the process of differencing to convert the data into stationary type, and, finally, q is processed on the MA [1,5].
In general, the time series is considered not to be stationary in the means and variances.If the time series is not stationary, then the transformation process should be carried out in variance and the differencing process is performed in the means.In variance, rules of transformation, namely (1) only for series Zt are positive, (2) the process of transformation is done before the process of difference, and (3) the value of λ serving as a standard is seen from the Sum of Square Error (SSE) in the transformation process.Normally, the smallest value of SSE variance indicating transformation process has been successfully carried out.Meanwhile, the means, the process of differencing will show specifically the period between data, Table 1.
According to the Box-Jenkins methodology, there are four stages in doing the forecasting using ARIMA model, i.e.; (1) identification of models and patterns; It visually looks the data pattern to be analyzed and check actual data validity, (2) parameters determination; can be done using statistical ttest and p-value, (3) model check (hypothesis testing and diagnostics); testing model that is widely used is the Ljung-Box Q statistic, to check the white noise with the provisions of the p-value> α of 0.05 and Kolmogorov-Smirnov test to check for normal distribution with the provisions of the p-value> α 0:05, and (4) forecasting; the results of ARIMA process will be analyzed in three parts, namely the upper limit, the lower limit should be worth 95%, and forecast values.The finest ARIMA model for forecasting is the model with the smallest error value [1,2].

D. Dataset Testing
Daily usage of data internet traffic is the main indicator of telecommunication usage in a particular network.Daily usage data internet traffic is used for the network technicians in controlling and managing the use of the network.In this study, daily usage data of internet traffic used is data of daily usage of internet traffic in the network at Mulawarman University taken from the main server using CACTI software.These data were taken in the span of 21 to 24 June 2013.Prior to the forecasting process is done, the original data is normalized to speed up the counting process without eliminating the actual data value [6].The normalization formula as (5).

𝑋 ̅ =
−     −  (5) In which,  ̅ is the original data;   is the maximum data value;   is the minimum data value.Table 2 presents the original data of daily usage of internet traffic.While Fig. 1 exposes daily usage of internet traffic plot.

E. Determining the finest forecasting model
The selection of the finest time series method is determined by an indicator measuring the accuracy of the data through a specific method of analysis.In the statistical method of determining the finest cut-off after lag q dies down ARMA (p,q) dies down dies down AR (p) or MA (q) cut-off after lag q cut-off after lag p Source: [1]  Where the value indicates the error value of testing a method.Therefore, the determination of the finest model is performed by selecting the smallest error value.Hence, the forecasting result having the smallest value is the finest model since it will give the test results closer to the actual value data [7][8][9][10][11].
In this study, the method of measuring the accuracy of forecasting is using MAPE, MAD, MSD.Where each method has a formula.First, MAPE formula is as follows, (6).
where,   observation value;   ′ forecasting value; and  the amount of observation.
This present study deals with a comparative study on the result of predetermined statistical model testing; decomposition, Winter's exponential smoothing and ARIMA.The following Fig. 2 illustrates the flow of undertaken study.

III. Results and Discussion
In this study, the observations were contrived to the daily usage of Internet traffic (inbound and outbound) at a state university.The data was collected for forecasting in June 2013 for 4 days (21-24 June 2013) amounting to 192 data samples.Further, the data were analysed and observed using a predetermined statistical method includes decomposition, Winter's exponential smoothing, and ARIMA.The aforementioned methods were determined and undertaken due to seasonal variation of daily usage of internet traffic.SPSS 19 and Minitab 16 were utilized to assist the data analysis.

A. Decomposition Analysis
The first stage undertaken is to test the data using decomposition model.In this study, decomposition models used consisted of two models of decomposition includes additive and multiplicative decomposition.Simultaneously, the process of network traffic analysis was done by dividing the dataset into two parts, namely the data inbound and outbound.The data were analysed separately.Then, the analysis results are re-consolidated.The fairly-decent error rate of forecasting was obtained by decomposition additive which MAPE is worth 4.69E + 01, MAD is worth 1.65E-01, and MSD is worth 4.02E-02.

B. Winter Exponential Smoothing Additive Analysis
The second stage is to test the data using Winter's exponential smoothing additive models.In this study, Winter's exponential smoothing consisted of additive and multiplicative models were used.The process of analysis is done identically to the decomposition model.In this study, respectively trend and smoothing of the data set are worth 0.2 and 0-1 to get a satisfying forecasting accuracy.The fairlydecent error rate of forecasting was obtained by Winter's exponential smoothing additive which MAPE is worth 2.35E + 01, MAD worth 1.89E + 06, and MSD is worth 2.58E + 06.

C. ARIMA Analysis
The last stage is to test data using ARIMA model.The data testing within ARIMA phase was done by stationer processes, thus the data converting into variance (transformation) and means (differencing) to obtain the ARIMA model (1,0,0), (1,1,0), (1,1,1), (1,0,1) and (1,2,1).After checking the model (hypothesis testing and diagnostics) with the test model Ljung-Box Q statistic, to check the white noise with the provisions of the p-value> α of 0.05 and continued with the Kolmogorov-Smirnov to check the normal distribution with the provisions of the p-value> α 0:05.Then the ARIMA model (1,0,2) has qualified which upper limit, the lower limit is worth 95%.A fair forecasting error rate is obtained with ARIMA Where MAPE is worth 2.78E + 01, MAD is worth 2.54E + 06, and MSD is worth 1.89E + 06.The results of the forecasting are illustrated in Fig. 2, 3, and 4 and the comparison of MAPE, MAD, and MSD are exposed in Table 3.

IV. Conclusion
This present study utilizing statistical methods Decomposition, Winter's exponential smoothing and ARIMA to forecast the usage of Internet traffic on Mulawarman University.In order to identify the results of forecasting, the three predetermined models, MAPE, MAD, and MSD were employed.The test results of the three methods confirm that ARIMA model (1,0,2) has a fair forecasting error rate which is calculated with the smallest value of MSD is 1.89E + 06.This indicates that the accuracy of the ARIMA forecasting accuracy approaches the actual data.However, the ARIMA model cannot accommodate the increase or decrease of internet users' frequencies.In addition, if the data sample is large then the forecasting result will be constant.Along with the widespread development of computational intelligence, then the future undertaken study will employ forecasting using one of the machine learning methods that are considered in accordance with the seasonal variation time series.

Table 2 .
Original data traffic on 21-24 June 2013 indicator is set to a certain size, among other things, mean absolute error (MAE) or mean absolute deviation (MAD), mean absolute percentage error (MAPE), mean square error (MSE) or mean square deviation (MSD), root mean square error (RMSE) and mean percentage error (MPE).The data test results indicator provided by values such as MAE / MAD, MAPE, MSE / MSD, RMSE, and MPE are the smallest error values.

Table 3 .
Comparison of three predetermined statistical model analysis Fig. 3. Winter's exponential smoothing additive Plot