Forecasting Stock Exchange Data using Group Method of Data Handling Neural Network Approach

Article history: Received 4 March 2021 Revised 29 March 2021 Accepted 4 April 2021 Published online 17 August 2021 The increasing uncertainty of the natural world has motivated computer scientists to seek out the best approach to technological problems. Nature-inspired problemsolving approaches include meta-heuristic methods that are focused on evolutionary computation and swarm intelligence. One of these problems significantly impacting information is forecasting exchange index, which is a serious concern with the growth and decline of stock as there are many reports on loss of financial resources or profitability. When the exchange includes an extensive set of diverse stock, particular concepts and mechanisms for physical security, network security, encryption, and permissions should guarantee and predict its future needs. This study aimed to show it is efficient to use the group method of data handling (GMDH)-type neural networks and their application for the classification of numerical results. Such modeling serves to display the precision of GMDH-type neural networks. Following the US withdrawal from the Joint Comprehensive Plan of Action in April 2018, the behavior of the stock exchange data stream and commend algorithms has not been able to predict correctly and fit in the network satisfactorily. This paper demonstrated that Group Method Data Handling is most likely to improve inductive self-organizing approaches for addressing realistic severe problems such as the Iranian financial market crisis. A new trajectory would be used to verify the consistency of the obtained equations hence the models' validity. This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/).

natural and legal people, are busy exchanging financial securities, goods, and other properties that can be paid for by a small commission. The market prices depend on supply and demand. On the other hand, financial securities include shares, bonds, and some other goods (such as expensive metals or agriculture produce) [6].
Communications and information technology have led to the growth of easy online sales that reduce time and cost and prevent physical presence in crowded places. Today, one can easily do desired shopping with a few simple clicks. This technology has brought new challenges, including data management systems, data recommendation, data classification, and data security risk. Some studies [7] and [8] have proposed that forecasting and ensuring should be considered a management issue. The main concern in online business organizations is the management of information security that deals with the issues of data breaches, identification of burglary, and other online frauds. In terms of global security, data breaches are a major concern. Data breaches affect 93 percent of big businesses and 87 percent of small businesses in the United Kingdom [9] [10]. In the United Kingdom, the total cost of a data loss is about 4.1 million dollars, and rehabilitation lasts around nine months and three days [11]. Although various technical solutions have been proposed for forecasting exchange in recent years, and some are being upgraded, forecasting exchange is still considered a necessary approach [12]. Many approaches in forecasting stock exchange indices have been used by the neural networks approach and other algorithms. One of the newest methods is proposed in [13]. They all have advantages and disadvantages, including computational complexity, high runtime, ungeneralizable algorithms, and insufficient accuracy.
Statistical technological metrics were used in several previous experiments to forecast exchange rates. Soft computing tools were used in some of the proposed approaches as a forecasting scheme. Hann et al. proposed a new approach for predicting exchange rates based on neural networks vs. linear models using monthly and weekly data in 1996 [14]. Mahnaz et al. introduced a Bayesian statistics algorithm for predicting exchange rates in 1997. Regardless of the economic model used by forecasters, the solution suggested may be used. The International Fisher Effect was used to show how the proposed model could be used in practice and how its results differed from the mean squared method [15].
The neural networks proposed by Zhang et al. were used to predict the British Pound/US Dollar exchange rate. It looked at how the number of input and hidden nodes and the scale of the training sample affected output in-sample and out-of-sample. For this investigation, the British pound/US dollar exchange rate was used. Neural networks outperformed linear models, mainly when the forecast horizon was small, according to the findings. Furthermore, the number of input nodes has a more significant effect on output than the number of hidden nodes, while a more significant number of observations allows prediction errors to be reduced [16]. Rodrıǵuez et al. proposed simultaneous nearest-neighbour methods [17], Leung et al. presented general regression neural networks (GRNN) algorithm [18], and Michael et al. introduced SETAR models for exchange rate forecasting [19]. Chen et al. published a Bayesian vector error correction model for exchange rate forecasting in 2003. They developed the Bayesian Vector Error Correction Model (BVECM), which they used to predict shifts in currency exchange rates for three big Asia Pacific economies one month ahead [20]. Chen et al. [21] developed a regression neural network and used an adaptive forecasting method that combined the strengths of neural networks and multivariate econometric models to provide a solution to error correction in foreign exchange forecasting and trading. A time series model was used to estimate the exchange rates, and a General Regression Neural Network was used to correct the estimate errors in this process. Several experiments and statistical methods were used to compare the consistency of the two-stage models (with neural network error correction) and the single-stage models (without neural network error correction).  [26]. Korol et al. in 2014 proposed a fuzzy logic model [27], and in 2015, Shen et al. presented deep belief networks and conjugate gradient method [28], and in 2016, Abounoori et al. introduced Markov switching GARCH approach [29] for forecasting exchange rates. In 2017, Kolasa et al [30] proposed DSGE models, and in 2018, Sun et al. introduced a new multiscale decomposition ensemble approach. In the latter approach, foreign exchange rates were divided into a limited number of subcomponents by employing the variational mode decomposition (VMD). And in order to model and forecast each subcomponent, it used the support vector neural network (SVNN) technique. Another SVNN technique was utilized to integrate the forecasting results of each subcomponent. The quality of the performance of the proposed approach was tested by comparing and evaluating four key exchange rates. The experimental results showed that the forecasting accuracy and statistical tests of the proposed VMD-SVNN-SVNN multiscale decomposition ensemble approach were higher than some other benchmarks. Therefore, in forecasting foreign exchange rates, the proposed VMD-SVNN-SVNN multiscale decomposition ensemble method proved superior [31]. Variational Mode Decomposition and entropy theory was proposed to forecast exchange rates [32]. Dzalbs et al. proposed Cartesian Genetic Programming and artificial neural network (ANN), and Amat et al. presented simple machine learning methods for forecasting exchange rates [33]. The methods they used were sequential ridge regression and the exponentially weighted average strategy. Neither of them estimated an underlying model with discount factors but combined the fundamentals to directly output forecasts [34] One self-organizing approach is named the group method of data handling (GMDH) algorithm among the newly developed methods. This approach evaluates the models' accuracy in a group of multi-input single-output data pairs, eventually yielding more complex models. As a result, GMDH will aid in creating an analytical function in a feed-forward network by solving the problem of initial information about the system's mathematical model and relying on a quadratic node transfer function. The coefficients of the equation are calculated using the regression technique [39][40] [41]. Thus, this research makes use of the GMDH neural network algorithm to forecast stock exchange data. Other sections of the article include the introduction of the method with the description of the proposed architecture, results and discussion, and finally, the conclusion.

II. Method
The artificial neural network is an information processing system that has features in common with natural neural networks. Neural networks are generalized mathematical models of human recognition based on biology that have several assumptions, some of which are as follows [6][42]: • Neurons do information processing operations. • Signals are transferred between neurons in the network via their bonds or connections.
• Each bond has its weight that is multiplied by its transferred signals in common neural networks.
Each neuron applies an activation function on its inputs, the weighted sum of the input signals to produce its output signals. Figure 1 shows the flowchart is a suggested method for forecasting currency. As shown in Figure 1, the exchange dataset is first introduced into the proposed system. Then all data are preprocessed. After the preprocessing of the exchange dataset, the missing values are removed. Then the data prepared and received from the Stock Exchange are converted into an acceptable format for simulation.
In the next step, the data are normalized, and then the sampling process is performed. Train sample (80%) is used to generate the model by GMDH neural network algorithm and test samples (20%) to evaluate the performance of the proposed method. Train data are applied to the GMDH neural network algorithm, and a model is developed after the training. After model generation, test data is applied to the model for prediction. Finally, it is examined whether all specimens are predicted. If all new samples are completed, evaluation and calculation of the results are performed.
After the data are entered into the proposed system, the desired data are preprocessed, and then the unused and useless samples are deleted. Next, the data converted into cohesive after-pre-processing becomes an acceptable format for simulation tools. At this point, the data are usually converted into an Excel and integrated format.
For preprocessing the data, various methods have been proposed, which are as follows: • Clearing the data • Data collection • Data transfer • Reduced Given the problem in this research, the present study has made use of only the data clearing method. The proposed strategy analyzes the data and identifies if the row or column has empty or unused values. Then the empty or unused values will be examined before and after the sample, and their averages will be computed. Finally, the null value will be replaced with the obtained average. By doing so, the samples will be lost, and more consistent data will be generated.
After the prototype has disappeared, the data have to be prepared and for this purpose. The preprocessed data are converted into an acceptable format for simulation tools. The default format for the data is in Excel, and therefore, for the analysis, we initially need case-tested data.
In the preprocessing stage, the values of each attribute of the data used are normalized from 0 to 1. Then we rotate the rows of the general data matrix randomly so that the order of the data is collected from the initial state, Get out. In addition, all data are mapped in a matrix form and changes in matrix rows, and then normalization operations are performed. Normalization is due to higher precision. Relationship (1) is used to normalize the values of each set of data.  It is assumed that the Train Samples will learn the GMDH algorithm, and the resultant model will be used for other data. Training samples are usually 80% of the total sample. Test samples representing 20% of the total samples are used to evaluate and validate the method. These samples are used for testing and measuring the efficiency and validation of the proposed method. After the training samples (80%) and the experimental samples (20%) were normally subdivided, we apply the training samples to GMDH neural network algorithms as inputs.

A. Group Method of Data Handling (GMDH)
Group method of data handling (GMDH) is a family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets that features fully automatic structural and parametric optimization of models. GMDH is used in data mining, knowledge discovery, prediction, complex systems modeling, optimization, and pattern recognition. GMDH algorithms (Figure 2) are characterized by an inductive procedure that gradually sorts out complicated polynomial models and selects the best solution through the so-called external criterion [9] [42]. A set of neurons can represent the standard GMDH algorithm. They contain different pairs of neurons in each layer, and they are joined utilizing a quadratic polynomial. The result of this connection is new neurons in the next layer.
This representation can be utilized to model the mapping inputs to outputs. The identification problem is formally defined as a way of finding a function ̂ so that it can be approximately used in place of the actual one, for the prediction of the output ̂ for a given input vector = There is now the problem of determining a GMDH-type neural network to minimize the square of the difference between the predicted output and the actual one: A complicated polynomial of the form can display the general connection between inputs and output variables, known as the Ivakhnenko polynomial [43].
Most applications, however, use the quadratic form of two variables in the following form to predict the output y. Using regression techniques, the coefficients a_i in equation (5) are calculated [43] in order to minimize the difference between actual output, y, and the calculated one, ̂, for each pair of . ) as input variables. In fact, it shows that through the quadratic form given in equation (5), whose coefficients are obtained in a least-squares sense, a tree of polynomials is constructed. Therefore, the coefficients of each quadratic function are derived to fit the output in the whole set input-output pair optimally: The construct of the regression polynomial in equation (6) The following matrix equation can be readily obtained if the quadratic sub-expression is used in the form of equation (6) for each row of M data triples: where a is the vector of unknown coefficients of the quadratic polynomial in equation (10): is the vector of output values from samples? It shows The least-squares technique from multiple-regression analysis results in the solution of the normal equations as follows: The vector of the best coefficients of the quadratic (5) for the whole set of M data triples, but this solution directly derived from normal equations is almost capable of rounding off errors and, more importantly, of the singularity of these equations.

B. Structural Identification of GMDH Type Neural Networks
For the structural identification of GMDH-type networks, there are different approaches [39][41]: 1. Increasing Selection Pressure Approach (ISP). In the approach of selection pressure, there is only one parameter that is sequentially increased in different layers so that the number of neurons in each layer and the number of layers in the network can be determined.
2. Pre-specified Structural Design Approach (PSD). This approach prescribes the number of layers in the network and the number of neurons in each layer.
3. Error Driven Structural approach (EDS). In this approach, following a threshold error for equation (6). It is possible to determine the number of layers and al. Moreover, the third approach also differs in that it is possible to include in subsequent layers some of the input variables or generated neurons in different layers. Therefore, the structure of such a network may be more complex than those generated in other approaches.

III. Results and Discussions
This Section describes the dataset, evaluation metrics and then discusses the results of the proposed method. Firstly, the dataset used to forecast the stock exchange data is described in Section III.A. Secondly, the evaluation metrics of the proposed method are presented in Section III.B. Finally, in Section III.C, the analysis of the results is discussed.
The software and hardware configuration of the experimental environment is shown in Table 1. As can be seen from Table 1, the operating system (OS) Windows 10, the OS type is also 64-bit, 4GB of RAM used that 3.06 gigabytes of usable, CPU is intel (Core™ i7 CPU) -Q720 @ 1.60GHz 1.60 GHz.

A. Dataset
The Iranian stock exchange Database has been the primary data source for the daily exchange rates in this research project. The data collection was from 4 January 2015 to 28 February 2018, as shown in Figure 3. They included the general forms of the dataset for the mechanism of access: the average price of gold and dollar, volume, value, and the number of transactions.
As Table 2 illustrates, the dataset was in two parts: in-sample subset and out-of-sample subset, but the table does not cover the detailed data. They can either be secured directly from the authors or accessed at the Iranian stock exchange database. Table 3 presents the descriptive statistics for the foreign stock exchange data. They describe the basic statistical characteristics of the data exchange rate and include minimum, maximum, mean, standard deviation, skewness, and kurtosis. As a measure of the symmetry of the dataset, skewness is employed. Zero skewness represents a perfectly symmetric distribution, while negative and positive   81  101  121  141  161  181  201  221  241  261  281  301  321  341  361  381  401  421  441  461  481  501  521  541  561  581  601  621  641  661  681  701  721  741  761  781  801 Normal(Changes) Days skewness show that distribution is skewed to the left and right, respectively. However, if the value of absolute skewness is great, the asymmetry is more specific. However, kurtosis represents a measurement of the extremities (i.e., tails) of the distribution of data, indicating the existence of outliers. The standard measure of kurtosis is on the basis of a scaled version of the fourth moment of the data. This figure shows the tails of the distribution. In this regard, a higher kurtosis results from infrequent extreme deviations (or outliers) than the frequent modestly sized deviations. The kurtosis of any univariate normal distribution (standard Gaussian distribution) is 3, and if it is greater than 3, the observation is more concentrated with a shorter tail than the normal distribution. However, if the kurtosis is less than 3, the observation is not so concentrated, and it has a longer tail than the normal one, as is common in the uniform distribution of the rectangle. The results are normalized with equation (1) in Table 3.

B. Evaluation Metric
In order to assess the level forecasting accuracy of the proposed FED-GMDH, three main evaluation metrics are used to compare the out-of-sample forecasting performance. For level forecasting accuracy evaluation, the mean absolute error (MAE), the root mean square error (RMSE), and the mean absolute percentage error (MAPE) is selected, respectively, as follow: where T is the number of observations in the out-of-sample subset, yt and y't are respectively the actual value and the forecast value at time t.

C. Analyzing Results
This Section analyzes the simulation results of the proposed method for forecasting stock exchange data and discusses the evaluation metric. The primary purpose is to design an intelligent system that can predict the unstructured pattern of dollar prices based on other financial market attributes. Figure 4 and Figure 5 show the fitting of train and test data sets. Several training runs have been performed using different numbers of the group. Figure 4 shows the error distributions from the neural network, with the sample containing all 400 simulated patterns. The plots are quite encouraging in that there are no very badly reconstructed patterns, and the error distribution is reasonably uniform throughout the space. There is some evidence of a slight systematic misfitting of wide spectra.
In Figure 4(a), 750 train samples are shown. In this Section, the dollar prices and the predicted value in the training process are shown by the GMDH algorithm. As can be seen, the GMDH algorithm performs the training process with high accuracy and low error. In Figure 4 Figure 5 shows an example test with the GMDH neural network that fits superimposed. In Figure  5(a), 85 test samples are shown. In this Section, the dollar prices and the predicted value in the testing process are shown by the GMDH algorithm. As can be seen, the GMDH algorithm performs the testing process with high accuracy and low error. In Figure 5  In Figure 6(a), 825 all samples are shown. In this Section, the dollar prices and the predicted value in the predicting process are shown by the GMDH algorithm. As can be seen, the GMDH algorithm performs the predicting process with high accuracy and low error. In Figure 6 As shown in Figure 7, the linear regression offers information on two extremes. It provides a global appreciation of the accuracy (through the regression value R and through the slope and offset. It compares the position of each generated data point with its target counterpart.

IV. Conclusion
The correct prediction of the price of the currency in the stock market and the banking system of any country is significant. With timely and accurate forecasting of the currency, significant improvements can be made in each country's economy and foreign exchange industry. In this paper,  we have used the GMDH Neural Network algorithm to predict the price of a currency on the Iranian stock exchange. The GMDH algorithm is a deep neural network that can predict high-order time series examples such as currency. The proposed method in this paper has several stages: data preprocessing, data preparation and data normalization, separation of training and testing samples, entering training samples into the GMDH neural network algorithm, and generating a network model. Applying test samples to the generated model, forecasting the price of the currency and finally, evaluating the desired criteria. By simulating the proposed method, the GMDH neural network algorithm predicted the minimum amount of error in training, testing, and evaluation of the price of the currency. Therefore, this algorithm can be trusted and used to predict the currency's price on the Iranian stock exchange.
As a suggestion for developing the results of this research, we can use the combination of machine learning algorithms such as decision tree C5, SVM-Lib, and MLP in the form of a reinforcement learning system to improve the results of the GMDH algorithm. We also can improve the results by combining optimization methods, CATs, Dragon Fly, GA, and ACO.

Declarations
Author contribution