Stacked LSTM-GRU Long-Term Forecasting Model for Indonesian Islamic Banks

ABSTRACT


I. Introduction
As the country with the world's largest Muslim-majority population, Indonesia has enormous potential for the expansion of the Islamic banking financial system in the future, as evidenced by a robust network of Islamic banks [1].These banks follow Islamic law (Sharia) principles and follow ethical and moral criteria [2].The Indonesian government is aggressively promoting the growth of Islamic banking in response to the growing demand for Islamic financial products and services.Various regulatory frameworks have been put in place to support the formation and expansion of Islamic banks.The Financial Services Authority (OJK) is responsible for managing and regulating the operations of Islamic banks in order to maintain Sharia compliance [3].In addition to becoming full-service Islamic banks, conventional banks have built Islamic banking branches to accommodate the rising demand for Shariah-compliant services.These institutions provide a wide The development of the Islamic banking industry in Indonesia has become a significant concern in recent years, with rapid growth in the number of banks operating based on Sharia principles.To face emerging challenges and opportunities, a deep understanding of the long-term financial behavior of Islamic banks is becoming increasingly important.This study aims to predict the share price of PT Bank Syariah Indonesia Tbk, over 28 days using the LSTM-GRU stack.The observation stage includes importing the dataset, data separation, model variations, the training process, output, and evaluation.Observations were conducted using 10 model variations from 4 stacks of LSTM and GRU.Each model performs the training process in four epochs (200, 500, 750, and 1000).The results of observations in this study show that long-term predictions (28 days ahead) using four stacks of LSTM-GRU and daily training accumulation techniques produce better accuracy than the general method (using multiple outputs).From the observations we have made for predictions for the next 28 days, the model with the LGLG stack arrangement (LSTM-GRU-LSTM-GRU) produces the best accuracy at epoch 750 with an MSE LSTM-GRU 63.43762863.This study will undoubtedly continue in order to achieve even better precision, either by utilizing a new design or by further improving the technology we are now employing.range of Shariah-compliant goods and services, including savings accounts, financing, investment instruments, takaful (Islamic insurance), and zakat payments [4][5] [6].
Islamic banking in Indonesia has experienced rapid growth in the last few decades [7].This growth not only reflects global trends in sharia finance but is also reflected in the economic and social development of Indonesia, which has a sizeable Muslim population.Sharia banking provides financial access to people previously not served by conventional banking [8].The system has helped drive financial inclusion in Indonesia by providing access to banking products and services to groups previously considered "unbankable".The existence of Sharia banking also makes a positive contribution to the stability of the Indonesian economy as a whole [9].Diversifying Islamic banking and financing based on Islamic ethics helps reduce systemic risk [10].Thus, the growth of Sharia banking in Indonesia not only reflects high market demand but also creates a positive impact by encouraging financial inclusion, sustainable economic development, and the development of financial products and services that are in line with Islamic values [11], is an essential aspect of Indonesia's diverse and dynamic economic and financial development.
Despite substantial progress in Islamic banking in Indonesia, there are still issues, difficulties, and possibilities to be addressed.Evaluating and analyzing the performance, efficiency, and competitiveness of Islamic banks in comparison to conventional banks, as well as comprehending the dynamics and factors influencing the growth and long-term sustainability of Islamic banking in Indonesia, is critical for policymakers, regulators, and market players [12][13] [14].Long-term stock forecasting is required for investors and financial institutions to make good long-term investment decisions and strategies in the Indonesian market [15] [16].For investors looking to improve their investment portfolios, accurate long-term stock prediction estimates from Islamic banks are invaluable.While previous research still uses traditional financial models [17] or basic machine learning algorithms [18], with low accuracy results [19] and many biases, it is still far from what was expected [20].
In recent years, financial markets have seen a considerable surge in the application of Artificial Intelligence (AI) and Machine Learning (ML) techniques for stock market prediction [21] [22][23] [24].These strategies have demonstrated promising results in identifying complicated patterns and trends in financial data, supporting investors in making educated decisions.Recurrent Neural Networks (RNNs) have attracted much interest among other ML techniques due to their ability to handle sequential and temporal connections in data.The Long Short-Term Memory (LSTM) network is one form of RNN that has proven efficacy in time series analysis [25].LSTM networks can capture long-term relationships and reduce the missing gradient issue in standard RNNs [26].In addition, Gated Recurrent Units (GRUs) have emerged as an alternative RNN architecture that offers computational efficiency and performance comparable to LSTM [27].
Individually, the LSTM and GRU networks have been regularly used to estimate stock prices in the context of stock market prediction [28][29] [30][31] [32][33] [34].However, improved models that integrate the capabilities of the two architectures are still required to increase forecast accuracy.Despite the growing interest in Islamic banking and the importance of Islamic bank shares in Indonesia, there is a significant vacuum in the existing literature on long-term forecasts utilizing deep learning techniques.Most of the study focuses on traditional bank financial performance and short-term predictions, with minimal discussion of long-term stock projections in the Indonesian setting.This study aims to evaluate the performance of PT Bank Syariah Indonesia Tbk's long-term stock prediction model.Two novel approaches are proposed.The first is optimizing the model with a separate training process using ten variations of the 4 LSTM-GRU stacks.The second approach is the input and target data segmentation technique, adjusted to the predictions for the next 1 to 28 days.By stacking many models, deep learning models become better and more useful for forecasting time series data [35][36] [37], particularly for predicting stock values [38] [39][40] [41].Several experiments on merging several machine learning approaches to predict time series data have been conducted [42].Predicting water prices with an LSTM-GRU model is more accurate than using the GRU and piles with an LSTM-LSTM arrangement [43].When predicting complicated stock market data, the hybrid Akima-EMD-LSTM model outperforms the hybrid EMD-LSTM, EEMD-LSTM, and SEMD-LSTM models [44].Stock price prediction employs a time-series analysis of LSTM and sentiment analysis of the Valence Aware Dictionary and Sentiment Reasoner (VADER).Compared to earlier research, this method yields more accuracy [45].The CNN, RNN, LSTM, CNN-RNN, and CNN-LSTM algorithms are used to predict the Shanghai Composite Index shares.The CNN-RNN approach outperforms other methods (CNN, RNN, and LSTM) [46].For music data, classic tanh, LSTM, and GRU are used, with LSTM and GRU having benefits over standard tanh units [47].A stacked LSTM model is used to detect abnormalities in four separate datasets.

II. Methods
This study was carried out in stages, beginning with data collection, then the separation of training and test data, the separation of goal data for long-term predictions of the following 28 days, model creation, and assessment.The research flowchart shows in Figure 1 describes the steps of this investigation in general.The following is a detailed explanation of the experimental process flow for predicting Sharia stock prices using the LSTM and GRU stack models, starting from importing the dataset to output: • Import Dataset: From 01-07-2020 to 01-07-2023, the stock time series dataset from PT Bank Syariah Indonesia Tbk (BRIS) was taken from https://finance.yahoo.com.The data set has 728 rows (days) and six columns (Open, High, Low, Close, AdjClose, and Volume), with data from the "Close" column being used in this study.• Data separation is done by taking the last 28 days of the dataset to be used as prediction data for the next 28 days.Then, the remaining 700 days of data are divided into training data (600 days) and test data (100 days).• Modeling is building 10 model variations from 4 LSTM and GRU stack arrangements, namely: GGGG, GGGL, GGLL, GLGL, GLLG, LGGL, LGLG, LLGG, LLLG, LLLL.G is for GRU, and L is for LSTM.This model will be trained on training data using machine learning algorithms, includes initializing the model, determining the loss function, selecting the optimizer (e.g., Adam), and determining the evaluation metric, the Mean Square Error (MSE).• Evaluation: Once training is complete, the model should be evaluated to measure how well it predicts stock prices.This evaluation is usually carried out on previously separated test data.This experiment uses evaluation metrics such as MSE to assess the quality of model predictions.Additionally, visualizations such as graphs comparing predictions with actual data can also provide valuable insights.• The output is depicted in the form of a graph that shows historical visuals between actual data and predicted data.
To be able to determine the level of accuracy of the results of the training that has been carried out.So, measurements are made between the predicted results and actual data using the MSE measurement method.

A. LSTM-GRU
RNN employing backpropagation is the first deep learning model that can recall prior data and predict data one step ahead [48][49][50] [51].Adding layers can enhance accuracy, but doing so with the RNN might result in a diminishing gradient.As a result, the RNN can only overcome short-term reliance [52] [53].Because of this issue, LSTM [54] and [55] cells were created, which have several gates and may overcome long-term dependence.GRU, a cell with a simpler gate that can also overcome long-term dependencies, is a further advancement [46][56].Figure 2 depicts architectural advancements beginning with RNN, then LSTM, and finally GRU.(usually from t =1 to T, where T is the length of the input sequence).For each LSTM layer i, calculate the hidden state     and cell state     , as in ( 1) to ( 6) and for each to-j GRU layer, calculate the hidden state     as in ( 6) to (10). 
=     .tanh (    ) The output result of the last layer of LSTM and GRU at the last time step T is the final result of the model as in (11) Pseudocode for LSTM-GRU stacks represents a high-level algorithmic outline for constructing a deep neural network architecture that combines LSTM and GRU layers.This pseudocode specifies the critical steps for building a stacked RNN, starting with the definition of hyperparameters and input data placeholders, followed by creating multiple LSTM and GRU layers with their respective hidden states.The final hidden states of these layers can be concatenated or combined as needed for downstream tasks.By stacking LSTM and GRU units, the model aims to capture complex sequential patterns, making it particularly useful for tasks involving sequential data analysis.
The traditional LSTM and GRU models have several limitations compared to model stacks that combine LSTM and GRU.Following are some of the main limitations of traditional LSTM and GRU models.Lack of ability to handle long-term information [57].Although LSTM and GRU are designed to overcome the vanishing gradient problem in RNN models, they still have limitations in handling long-term information.These models can remember information from several previous time steps, but over very long periods, they may still have difficulty.More expensive computing, LSTM, and GRU models are relatively computationally complex [58], mainly when used in deep or layered networks, which can result in longer training times and require more excellent computing resources [59].
Susceptible to Overfitting: LSTM and GRU models are more susceptible to overfitting when used on relatively small datasets [60].Because the number of parameters in these models is significant, they can "memorize" existing training data rather than understanding general patterns.Not Optimal for Specific Tasks: While LSTM and GRU are reasonable solutions for many tasks in time series modeling, there are some specialized tasks, such as text processing (NLP), that require more specialized architectures, such as transformers [61].
To overcome these limitations, a stack of LSTM and GRU models can provide several advantages, including.Richer Representation Capabilities: with a stack of LSTM and GRU models, we can use multiple LSTM and GRU layers sequentially [61], allowing the model to represent the data better and describe more complex relationships in the time series.In hierarchical learning, the model stack can learn a hierarchy of information.The first layer can understand more basic patterns, while subsequent layers can understand increasingly abstract and complex patterns [61].
Reduces the risk of overfitting with the addition of layers and techniques such as dropout between layers, and model stacks can help reduce the risk of overfitting, mainly if managed wisely [62].Flexible Architectural Combinations: combining LSTM and GRU in various configurations in a model stack allows flexibility in designing the most appropriate architecture for a particular task [62].However, it should be noted that stacked LSTM and GRU models also require careful tuning and attention to overfitting.The selection of appropriate architecture and parameters will significantly influence the quality of model predictions.

B. Data Separation
The dataset is divided into training data (700 days), test data (100 days), and prediction data (28 days).Figure 3 shows the division of training data and test data as a history graph.The training procedure is conducted to create a model.Predictions were performed using training and test data to evaluate the performance of the resultant model as shown in Figure 4. Prediction data (28 days) has been disguised and is only used to evaluate prediction outcomes; it is not included in the training process.We employ recurrent training approaches that are carried out individually for predictions from 1 day to 28 days ahead to anticipate the following 28 days without training data.The input data spans 7 days, whereas the desired data spans 1 day.The forecast for the first day is based on one day of target data, which is one day following the training data input.The forecast for the second day uses one day of target data that were collected two days after the input training data, and so on until the prediction for the 28th day utilizes one day of target data that was collected 28 days after the input training data.Each training procedure is repeated ten times with a distinct 4-layer LSTM-GRU arrangement model [63] to get the best outcomes.Figure 5 depicts the separation of input and target data for forecasts from one to 28 days.--------------------------------------------------------------------------------------------------------------------------------------------------------------------------  The selection of these four epoch points provides a rich perspective on how the model develops its performance over time.However, keep in mind that in practice, the choice of the number of epochs must also be considered along with other factors such as learning rate, batch size, model complexity, and the characteristics of the data used.
The Adam optimization function is used to construct the model, with a learning rate of 1,001, nodes for each layer of 50, and a batch size of 64. Figure 6 depicts the process from input to deep learning models with 10 variations, predictions, and MSE values produced for each model variant.Adam combines the concepts of momentum (to help handle local minima) and RMSprop (to set the learning rate) in one algorithm.It uses moving estimates of the first gradient (momentum) and the second gradient (RMS momentum) to calculate weight updates.The learning rate can fluctuate for each parameter based on previous gradient history.These estimates are adjusted to consider the weighted average exponential factor (with higher learning rates).usually, smaller learning rate values (e.g., 0.001) are used to ensure stable convergence.Hidden Layer 50: This refers to the number of nodes (neurons) in each hidden layer in a neural network.This value shows the complexity of the model that has been created.The more nodes, the greater the model's ability to capture complex patterns in the data, but it can also increase the risk of overfitting if the training data is limited.Batch Size 64: This is the number of data samples used in each weight update iteration (mini-batch learning iteration).Larger batches can speed up training due to more efficient optimization, but they also require more memory.Too small a batch can cause unstable convergence.Batch size 64 is a commonly used value in most cases.

D. Evaluation Criteria
To assess model effectiveness, we employ a statistical technique known as Mean Square Error (MSE).MSE is calculated as the sum of the squares of the error distance between the anticipated outcomes and 28 previously hidden observation data points (actual data), then divided by the sample size.A lower MSE value suggests improved performance [64].The formulation for MSE is shown in Equation 1, where the variables  are predicted data, variables  are actual data (observations) that are concealed, and n indicates the number of sample data.
A lower MSE value indicates that the experimental model can better predict stock prices accurately, which means that the difference between model predictions and actual stock prices tends to be smaller.Conversely, a high MSE value indicates the model has a significant mismatch in predicting stock prices.MSE is a simple and easy-to-understand metric.The smaller the MSE value, the better the model predicts stock prices.MSE can give high weight to significant errors in predictions, which is helpful in cases where outliers (significant differences between predicted and actual values) must be considered.The use of MSE in evaluating forecasting models for the next 28 days will help to measure the quality of model predictions and to compare different models or update the model if necessary.

III. Results and Discussions
The training procedure used 10 model versions and 4 epochs (200, 500, 750, and 1000), resulting in 40 prediction graphs with 120 MSE measures.We only provide one graph of the projected outcomes (out of 40 graphs) for the training data phase, test data, and 28 days of prediction data (Figure 7) because of page limits.To make the 28-day forecast chart more visible, we expanded a smaller section (Figure 8). Figure 8 indicates that the 28-day forecast, particularly, has acceptable fluctuations until day 28 and continues to follow the original data pattern, starkly contrasting with long-term prediction approaches in general, which tend towards a specific value (convergent) with a more substantial bias for more extended data forecasts.1 and 2 show the MSE values graphically.Tables 1-2 and 7 show that the best model for predicting training and test data is the Var-10 with the LSTM-LSTM-LSTM-LSTM (LLLL) stack architecture, with MSE values of 1795.1927 and 1485.7672,respectively.Meanwhile, Var-7 with the LSTM-GRU-LSTM-GRU (LGLG) stack architecture is the best model for 28-day predictive data, with an MSE of 63.4376.
Table 1 summarizes the MSE evaluation with all training procedures in the 200-500 epoch range.This model is a stack of four sequential layers with two different types of memory cells, namely the GRU and LSTM.Epoch 200 prediction of 28 days: this MSE value of 90.8903961 shows how much this model performs in predicting data and indicates that the model has a relatively large error rate, which means that the difference between the stock price predicted by the model and the actual stock price at each time point in the dataset is relatively significant.MSE of 90.8903 indicates that the GRU, LSTM, LSTM, and GRU stack model needs to be refined to improve the quality of stock price predictions.Careful evaluation and model adjustment are essential to overcome these limitations and achieve more accurate predictions.63.4376.These results use variant seven with a stack of LSTM, GRU, LSTM, and GRU.The MSE value is a metric that measures the average of the squared differences between model predictions and actual values.In this context, an MSE value of 63.4376 means that the squared average difference between the predicted value and the actual stock value for the next 28 days is approximately 63.44 (in units that correspond to the stock data, for example, in dollars).Interpretation: a lower MSE value indicates that this model can predict better because the difference between the prediction and the actual value is smaller on average.Therefore, in general, the MSE value of 63.44 indicates that the model has fairly good prediction quality.Epoch 750 is an iteration through the entire training dataset used to train the model.By the 750th epoch, the model has undergone many iterations through the data and has made repeated adjustments to the weights and parameters used to make predictions.The combination of LSTM, GRU, LSTM, GRU stack can give the model the ability to capture complex patterns in time series data.LSTM has the ability to remember information in the long term, while GRU is more efficient at handling information in the short term.This combination allows the model to combine the advantages of both.The prediction results for the next 28 days show that the seven variants model with the LSTM, GRU, LSTM, GRU stack has the potential to provide fairly good stock price predictions.However, the use of these predicted results must be integrated into a careful investment strategy and pay attention to risk factors that may influence stock prices.Figure 9 to Figure 11 show the MSE values of the training process for training data, test data, and 28-day data predictions, respectively.3 present the performance study of present models.In previous studies conducted by [31], in this paper, a new model for optimizing stock forecasting is proposed that incorporates a range of technical indicators, including investor sentiment indicators and financial data, and performs dimension reduction on the many influencing factors of the retrieved stock price using depth learning LASSO and PCA approaches.The paper's insight is to propose a new model for optimizing stock forecasting by incorporating technical indicators and performing dimension reduction using LSTM and GRU models.LSTM and GRU models can effectively predict stock prices; the LASSO dimension reduction method performs better than PCA.In previous studies by [65] to forecast the stock price, the LSTM, bi-LSTM, GRU, and ordinary neural network In the results of studies by [66], the authors proposed using deep learning in making stock predictions.This paper compared the performance of six deep-learning algorithms to predict stock closing prices on the Indonesian Stock Exchange.Insights The paper proposes using a CNN-LSTM-GRU hybrid algorithm for stock price prediction, which outperforms other methods in terms of accuracy.Based on the research that has been carried out by [67], this paper proposes a trading strategy designed for the Moroccan stock market based on two deep learning models: LSTM and GRU to predict, respectively, the close price for the short-and mid-term horizons.The proposed strategy outperforms benchmark indices in the Moroccan market; future work includes focusing on medium-and long-term predictions.The paper proposes a trading strategy for the Moroccan market using LSTM and GRU models for short-and medium-term price prediction.Bi-LSTM and GRU models MSE 0.0018 [66] CNN-LSTM-GRU hybrid algorithm RMSE decreased by 14%, MAE reduced by 13.4%, R 2 3.9% [67] LSTM and GRU models MSE 0.57 [68] LSTM and GRU MAPE 97.37% [69] -Two-layer stacked LSTM (TLS-LSTM) -Correlation analysis between different currency pairs MSE 0.0015129 [70] Stacked-Bi-LSTM RMSE 0.025 Proposed models LSTM-GRU-LSTM-GRU stack MSE 63,44 The results of studies carried out by [68] methods use LSTM and GRU.In this paper, the authors propose eight new architectural models for stock price forecasting by identifying joint movement patterns in the stock market, which combine the LSTM and GRU models with four neural network block architectures.Eight new architectural models have been proposed for stock price forecasting.Evaluation of the proposed models using three accuracy measures The paper proposes eight new architectural models that combine LSTM and GRU algorithms with neural network block architectures to predict stock prices using grouped time-series data accurately.In the research conducted by [69] in this article, a TLS-LSTM neural network was used to forecast the trend of the Australian Dollar and United States Dollar (AUD/USD) and conduct a correlation analysis.TLS-LSTM outperforms other models in Forex trend prediction; AUD/USD movement affects EUR/AUD and AUD/JPY.The study proposes using a TLS-LSTM neural network for forex market forecasting and conducting correlation analysis between different currency pairs.Research conducted by [70] The Stacked Bi-LSTM (SBiLSTM) architecture, a modification of the conventional Deep Long-Short Term Memory (TDLM), is offered in this study.Two-time series from oilfield production are used to test the method.Comparative comparisons are made regarding the proposed SBiLSTM model's performance with those of multi-layer RNNs, Deep GRU, and Deep LSTM.

IV. Conclusions
Machine learning can deliver improved long-term predicted performance for PT Bank Syariah Indonesia Tbk (BRIS) shares, which is critical for investors when making stock market decisions.This data may also assist analysts in developing long-term financial strategy indicators.In this paper, we propose a distinct training approach for 1-day to 28-day forecasts utilizing 10 versions of deep learning models from 4 LSTM-GRU stacks and tailored input-target data segmentation algorithms.The LSTM-LSTM-LSTM-LSTM (LLLL) stack is used to obtain the best model for the prediction phase of training and test data utilizing BRIS stock history data from 01-07-2020 to 01-07-2023 (728 days).Furthermore, the LSTM-GRU-LSTM-GRU (LGLG) stack model gives the most accurate long-term forecast for the next 28 days.
The graph results from the altered input-target data segmentation approach exhibit variations and a perfect correlation with the observed data.Long-term forecasts do not exhibit significant volatility when utilizing the deep learning approach (without input-target data segmentation) solely but tend towards a constant (convergent) value.Long-term predictive research with even better accuracy is still possible, either by applying different methodologies or extending the techniques and procedures we have developed.
The LSTM-GRU-LSTM-GRU stack model is a complex model that can be very good at handling complex time-series data.However, managing and maintaining such models requires considerable computing resources and a deep understanding of time series modeling.Overall, the LSTM-GRU-LSTM-GRU stack model can be a handy tool for forecasting long-term stock prices.

Fig. 5 .
Fig. 5. Illustration of training and target data separation for predictions ranging from 1 to 28 days C. Modeling Each training procedure is carried out in 10 variations of four distinct layers of the LSTM-GRU arrangement to get the most excellent model performance: Var-01: GGGG, Var-02: GGGL, Var-03: GGLL, Var-04: GLGL, Var-05: GLLG, Var-06: LGGL, Var-07: LGLG, Var-08: LLGG, Var-09: LLLG, Var-10: LLLL.The letter L represents LSTM, and the letter G represents GRU.Each training procedure uses four epochs (200, 500, 750, and 1000).Choosing the number of epochs (iterations through the entire training dataset) in training a neural network model is an important decision based on sound judgment, especially in using four epochs (200, 500, 750, and 1000).Below, we will provide scientific arguments for choosing this number of epochs: • Convergence Requirements: The number of epochs used in model training depends mainly on the complexity of the model, the volume of data, and the desired level of convergence.The more complex the model, the longer it takes to reach convergence.The number of epochs spanning four points (200, 500, 750, and 1000) reflects an attempt to examine how the model behaves at various points in training, from early to more advanced stages.• Performance Monitoring: During training, it is essential to monitor model performance on validation or test datasets to prevent overfitting.By using several different epoch points, we can examine how the model behaves over time.Also seeing whether the model's performance continues to increase, reaches a peak, or even decreases at a certain point will help decide when to stop training or take other actions, such as reducing the learning rate or adjusting the model architecture.

Fig. 9 .
Fig. 9.The MSE values of the whole training process for training data (NN) modules are each designed sequentially.The performance of each separate model is then compared in this work with that of the suggested hybrid model.The NIFTY-50 stock market data implements the proposed stock price prediction model.The model predicts values along with the actual values of stock opening prices for (a) 100 days, (b) 300 days, (c) 500 days, and (d) 1000 days.
Pseudocode 1 is a pseudocode representation of stacking LSTM and GRU layers in a recurrent neural network (RNN).
.  is the input at each time step,     is the state (hidden state) of the-i LSTM layer at time step t,     is the cell state of the-i LSTM layer at time step t,     is the state (hidden state) of the-i GRU layer at time step t.

•
Probability Map Exploration: By trying several different epoch points, this process can also explore the likelihood map of the model's behavior.For example, at the initial epoch (200), the model has not converged enough and is biased towards the training data.At midpoints (500 and 750), the model can approach convergence and begin to fit the validation data.At the endpoint (1000), one can see whether the model continues improving in performance or has reached a saturation point.• Stability Evaluation: The stability of the model can also be assessed through these four epoch points.When a model has highly fluctuating behavior at early points in training, this may indicate that the high learning rate and complexity of the model may need to be adjusted.Conversely, if the model shows good stability at specific points, this may indicate that the process has found an exemplary training configuration.• Testing and Generalisation: Once training is complete at the endpoint (1000), the process can then test the model on never-before-seen data to measure generalization capabilities.If the model can produce good results on the test data, this will indicate that the training has been successful.

Table 1 .
The MSE of the whole training procedure in numerical form for epochs 200-500

Table 2 .
The MSE of the whole training procedure in numerical form for epochs 750-1000

Table 3 .
Performance study of present models