Forecasting Solar Activities based on Sunspot Number Using Support Vector Regression ( SVR )

This work is licensed under a Creative Commons AttributionShareAlike 4.0 International License Abstract The most significant progress on the 4.0 industrial revolution is the entire computer and robot connected to the internet connection. Satellites, as one of the internet network transmitters, have the threat of destruction if a solar storm occurs. The size of the activity that is on the sun can be known by observing sunspots. Solar activity in the future is known by forecasting sunspot numbers. This research forecasts sunspot numbers using support vector regression (SVR) to minimize adverse effects on the earth as an outcome of solar storms. The best SVR results on forecast sunspot numbers are on annual sunspot numbers obtained using RBF kernel. Measurement results from MSE, RMSE, and MAAPE are 35.32, 5.94, and 0.12, respectively. Forecasting concluded accurately based on MAAPE value, on 2020 and 2021 indicated potentially flare because the result of forecasting sunspot numbers is more than twenty.


1.
Introduction The 4.0 Industry era enables most of the industrial activities to be carried out by computers, robots, and automated technology. The concept of 4.0 Industry revolution is formulated to realize smarter factories [1]. The most significant advancement during the 4.0 Industry revolution is the connected robots and computers through the internet connection. One of the internet network access is satellite. The recent generation of satellite technology can reach the highest performance, even if its first performance was poor. The new generation internet satellite service is sufficiently excellent to compete with other network access [2].
The satellite functions as a signal transmitter from waves in the earth that are threatened by outer space activities, such as sun storms appear due to the substantial activities at the sun. The level of that solar activity can be identified through observation on the sunspot. That is because the complex configuration of sunspot easily provokes an unstable magnetic field that carries flare and coronal mass ejection (CME) [3]. The sun storm may destroy the satellites that orbit around the earth that cripples all activities relying on internet satellite [4]. Therefore, the number of solar activities should be identified for the required preparation and management. The sunspot number represents the number of black spots appear on the sun's surface. The enormous solar activity is represented by a considerable amount of black spots on the sun surface [5]. The number of sunspots can be empirically estimated [6]. Once the configured sunspots become more complex, the magnetic field on the sun surface becomes unstable, which causes a flare, as well as a coronal mass injection (CME). The sequence of sunspot, according to modified Zurich sunspot classification, is divided into A, B, C, D, E, F, and H class [7].
A projection of the sunspot number can be carried out to identify the intensity of solar activities in the future. Some previous studies have been carried out, such as the prediction of sunspot number to identify flare using fuzzy time series Markov chain model [8], long-term sunspot number prediction based on empirical mode decomposition (EMD) analysis and auto-regressive (AR) model using sunspot sample data from 1848 to 1992 [9], as well as prediction of time series sunspot based on variational mode decomposition (VMD) and backpropagation neural network-firefly algorithm [10].
In addition to those methods, the prediction can also be carried out using support vector regression (SVR). The usage of SVR as the prediction method has been completed in some previous studies, such as in the forecast of rainfall using some weather parameter [11], prediction of foodstuffs availability in facing demographic bonus [12], and prediction of inflation rate [13]. Another prediction performed using SVR is the forecast of link load on the network [14] using support vector regression (SVR), auto-regressive (AR), and moving average (MA) methods. The most exceptional result of the study on link load prediction is obtained from SVR with a lower error rate than the other two methods. Besides, SVR can also summarize documents based on the sentence position [14]. Therefore, this study forecasts solar activities based on the number of sunspots using SVR to minimize the effect of the solar storm on earth.

2.
Method The prediction of solar activity based on the sunspot number was completed using the sunspot number data from the Sunspot Index and Long-term Solar Observations (SILSO). The data of the sunspot number was constructed to be time-series data. The data was then divided into two and used for the training and testing process.
A technique that predicts a value in the future by considering the current and past data is forecasting [15], [16]. Forecasting has been widely used in various sectors, from the forecast of water weather [17] to the forecast of energy consumption on the natural resource sector [18]. This forecasting is also applied in the economic area, such as the forecast of economic growth [19], forecast of inflation rate [13], and so forth. Forecasting requires special consideration on its level of accuracy since it is followed by a discrepancy between the actual and forecast values.
The tests were started by the independent variable determination test. Once the independent variable was obtained, the time series data (that was divided into two) was formulated. The first part of the data was used for the training to get the prediction model. At the same time, the secondary data was used to validate or test the developed prediction system. The prediction method used in this study is the SVR method.
SVR is a regression model from the machine learning technique with a support vector machine (SVM) method. The primary concept of SVM is to create an isolating line, known as a hyperplane. As suggested by the name, the SVM hyperplane aims to segregate an object into some object, but in SVR, the hyperplane is the function of the regression line [20]- [22]. The regression function of the SVR method is presented below.
The measurement of predictor performance was completed by finding the error rate. A little error rate indicates that the system can be used for a forecast. Various estimation can be applied to ensure the quality of a prediction system. This study used some error rate estimation, consisting of mean square error (MSE), root means square error (RMSE), and mean arctangent absolute percentage error (MAAPE) [23], [24]. MSE represents the different average of the prediction value and the actual squared value. If the MSE value is added by root operation, then RMSE value is discovered. On the other hand, MAAPE is the arcus tangent average of the percentage of absolute error. The MSE, RMSE, and MAAPE can be calculated using formulas below.
At is the actual value on the period of t, Ft represents the forecasting value in the period of t, and n is the total observation data.

Results and Discussion
The initial step in a study of solar activity forecast is data analysis. The input data, in the form of sunspot number, was sorted into time series. The sample data of sunspot numbers are presented in Table 1; meanwhile, the time series sunspot number is shown in Table 2. The time-series data in Table 2 is formulated by the past sunspot number (xt-1) and the current sunspot number (xt). The data was used in the forecasting process with xt-1 as the input data and xt as the forecast target.
The forecast of solar activity using support vector regression (SVR) in this study consists of various processes. The first process was training or the formulation of the forecasting model. The second process was the model testing to identify its accuracy level. The SVR model formulation was completed using several types of the kernel, such as linear, gaussian, radial basis function (RBF), also polynomial.
The error rate measurement was carried out by the 2, 3, and 4 formula. The error value of each data comparison and kernel are presented in Table 3. According to the obtained error rates, the RBF kernel usage brings the most excellent results, compare to MSE, RMSE, and MAAPE values of 47.44, 21.78, and 0.29, respectively.
After the error minimum was obtained, the system was used to forecast sunspot numbers from 2013 to 2018, and the results are displayed in Table 4. The estimated RMSE value is 21.78. Based on the smallest to the greatest data range of 4.2-203.3, the forecast or prediction is classified as accurate.
Both small and substantial sunspot numbers illustrate the activities in the sun since sunspot is the pioneer of flare or CME. Based on the previous studies [5], the appearance of flare is related to the sunspot number due to the results of the curve fitting between flare index and the sunspot number has a correlation coefficient of 0.878. Thus, flare appears once the sunspot number reaches its peak-the appearance of flare or CME with tremendous explosion threat technology and climate on the earth.   The forecast for sunspot number in 2020 and 2021 are 30.51 and 75.12, respectively. These results show an improvement from the sunspot number in 2019. The results are in accordance with the intensity of future solar activity. Besides, the increase of solar activities in 2020 and 2021 accelerates the possibility of flare or CME explosion. The lower limit of the sunspot number potentially causes a flare, CME, and sun storm of 20 [24]. Consequently, this situation requires alertness due to the 2020 and 2021 forecast shows more than 20 sunspot number.

Conclusion
The forecast of solar activity based on sunspot numbers using SVR with kernel RBF produces the most excellent results. The results using kernel RBF has the lowest MSE, RMSE, and MAPEE than the other kernels. The obtained MSE, RMSE, and MAAPE values are 47.44, 21.78, and 0.29, respectively. Those error rates are derived from the coherence between actual and forecast data, primarily on 1990-1991, 2000-2002, and 2011 with significant differences. The RMSE value of 21.78 is categorized as excellent results. Thus, the forecast is classified as accurate. The results of the sunspot number forecast in 2020 and 2021 are 30.51 and 75.12, respectively. That number potentially causes a flare, CME, and sun storm since the sunspot numbers are above 20.