Recognition of Handwritten Javanese Script using Backpropagation with Zoning Feature Extraction

Article history: Submitted 22 October 2021 Revised 5 November 2021 Accepted 21 November 2021 Published online 31 December 2021 Backpropagation is part of supervised learning, in which the training process requires a target. The resulting error is transmitted back to the units below in its training process. Backpropagation can solve complicated problems because it consumes less memory than other algorithms. In addition, it also can produce solutions with a low error rate while executing less time. In image pattern recognition, backpropagation can be utilized for cultural preservation in many places worldwide, including Indonesia. It is used to recognize picture patterns in Javanese script writings. This study concluded that feature extraction approaches, zoning, and backpropagation could be utilized to distinguish handwritten Javanese characters. The best accuracy is attained at 77.00%, with the network architecture comprising 64 input neurons, 40 hidden neurons, a learning rate of 0.003, a momentum of 0.03, and an iteration of 5000. This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/).


I. Introduction
The backpropagation method is one of the approaches used in Artificial Neural Networks (ANN), separated into three layers: input, hidden, and output. Backpropagation can solve complicated problems since it consumes less memory than other algorithms and produces solutions with a low error rate and less time [1]. Moreover, this method is preferred because of its ability to distinguish incomplete or weak input patterns. There are three phases in the backpropagation training process, including the forward-backward and weight modification phases. As a result, backpropagation is frequently used in machine learning for a variety of tasks, including classification [2], prediction [3], forecasting [4], and image pattern recognition [5].
Backpropagation in image pattern recognition can be utilized to preserve cultures in diverse parts of the world, such as handwriting recognition of different regional languages in a country, particularly in Asia. Related studies discuss the identification and recognition of each printed Chinese character utilizing projection and zoning feature extraction [6]. In another study, the 990 most commonly used syllables were also used to introduce traditional Korean script, or Hangul [7]. In Thailand, the study identifies Thai letters that include 77 different character patterns [8]. In Japan, there is a study on a procedure to distinguish between Kanji and Kana writing styles for character recognition [9]. Moreover, due to the complexity of printed and handwritten Arabic letters, there is research on Arabic characters to synthesize the essential aspects of the Arabic writing style [10].
Because of the various writing styles, handwriting identification has been the subject of extensive and fascinating research over the last few decades. As a result, the focus of this study will be on using backpropagation to recognize the picture pattern of Javanese scriptwriting in Indonesia. Many similar studies have been conducted, including identifying each character in the Javanese script pattern using Backpropagation is part of supervised learning, in which the training process requires a target. The resulting error is transmitted back to the units below in its training process. Backpropagation can solve complicated problems because it consumes less memory than other algorithms. In addition, it also can produce solutions with a low error rate while executing less time. In image pattern recognition, backpropagation can be utilized for cultural preservation in many places worldwide, including Indonesia. It is used to recognize picture patterns in Javanese script writings. This study concluded that feature extraction approaches, zoning, and backpropagation could be utilized to distinguish handwritten Javanese characters. The best accuracy is attained at 77.00%, with the network architecture comprising 64 input neurons, 40 hidden neurons, a learning rate of 0.003, a momentum of 0.03, and an iteration of 5000.
This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/).

Keywords:
Backpropagation Feature extraction Image processing Javanese script Pattern recognition the horizontal and vertical image extraction method transformed through Fourier, which produced an accuracy of 59.5% [11]. Then, another study using backpropagation ANN resulting an accuracy of 61% [12]. Also, the research on Hanacaraka Javanese script using the Backpropagation ANN and producing an accuracy of 74% [13].
From those studies, no one has applied a novel strategy to feature extraction, which is zoning developed by Elima Hussain [14] with backpropagation. Therefore, this study discusses the introduction of handwritten Javanese scripts using the backpropagation zoning feature extraction method. In comparison to prior studies, it is expected that the current results will yield a higher accuracy value rather than that did not apply zoning feature extraction on Backpropagation,

II. Method
The initial data collection step will be completed in this project. The obtained data is subsequently entered into the second step, which is the pre-processing data. The pre-processing aims to use data in the feature extraction procedure [15]. The image's features will be stored in the feature extraction procedure at the third step. The normalization procedure is followed by testing with selected algorithms, as well as evaluation and validation of the findings. Figure 1 depicts the progression of the research stages. In the following subchapter, the method and the details of the steps will be detailed in greater depth.

A. Data Collection
The image of the Javanese character Nglegena script was used in this study. The data was obtained by spreading the form to respondents. Children, adolescents, and adults who have studied Javanese script were chosen as respondents depending on their age categories: children's ages range from 5 to 11, adolescents' ages range from 12 to 25, and adults' ages range from 26 to 45 years. Each age group will have ten respondents, bringing the total number to 30. Each respondent will write a total of 20 Nglegena Javanese characters. There are 600 photos in all that have been collected. Figure 2 illustrates a sample of respondents' responses to the scanned Javanese script.

B. Data Pre-processing
The picture data will be turned into an image used for feature extraction before further processing [16]. The steps at this process, including changing the image to a gray image (grayscaling), converting the image to a binary image (binarization), cleaning noise (noise removal), equalizing position (crop edge), and equalizing image size, are the stages involved in processing (resizing).
• Grayscaling is the initial level of image processing in this study. The percentage value of each pixel R, G, and B is added to obtain the color picture transition to gray image [17]. Equation • Binarization, the threshold value determines whether a grayscale pixel value changes from 0 to 255 or black and white [18]. The value chosen as a threshold is 128 [19]. This value is determined by dividing the white pixel value range by the number of gray pixels, which is 255. • Noise removal removes the noise from the marker ink and dirt that is simultaneously being scanned. This noise removal process uses the Wiener median filtering method, as illustrated in Equation (2) [20].
• Crop Edge is used to remove unneeded bits of Javanese characters before selecting them. Character cutting is done by searching for the highest and lowest x and y point values with black pixels [21]. • Resizing, at this stage, the image size is equalized to 120×120 pixels [22]. Figure 3 presents the difference between the raw image and the resized image.

C. Extraction Feature
In this study, feature extraction is done using a novel method called zoning, developed by Elima Hussain [14]. The 120×120 character image will be broken into 16, 25, 36, and 64 parts. This number of divisions was chosen since these values may entirely divide the image size. The zoning feature produces numerical data on the number of black pixels in each zone. Figure 4 shows an illustration of the zone division.

D. Normalization
Normalization is the process of scaling attribute values to fit within a given range [23]. Since the range of data acquired in this study is so vast, normalization is required. For example, a zone size of 10 by 10 pixels will have a black pixel value range of 0 to 100. Several normalizing strategies are presented; the normalization techniques used in this study are: • Min-Max Normalization is the process of scaling data from one range to another [24]. This method uses the formula stated in Equation (3) to change the data.
• Z-Score Normalization is accomplished by removing the value from the data's average and dividing it by its standard deviation [25]. Equation (4) shows the formula for this method.
• Decimal Scaling Normalization is a method of normalization that divides a variable by the power of 10 [26]. This approach is presented in Equation (5).

E. Classification
Backpropagation ANN is the classification algorithm employed in this study. Backpropagation is part of supervised learning because the training procedure requires a target. Backpropagation is named after the resulting error propagates back to the units below it throughout the training process [27]. The network architecture used in this study is a Multi-Layer Net, which is a network design with multiple layers. An input layer, a hidden layer, and an output layer make up the network. There are 16 neurons in the input layer that contain the value retrieved from the feature. The number of neurons in the hidden layer is modified to find the best outcomes [8]. The output layer has 20 neurons, which matches the expected number of Javanese characters, which is 20. The architecture employed in this investigation is illustrated in Figure 5. Backpropagation's pseudocode is shown in the code snippet below.

F. Testing
During the testing phase, the collected data will be separated into two categories, including training data and testing data. The k-fold cross-validation approach will be used to divide the data. The study employed 5, 10, and 15 as the number of k [28]. In this experiment, 600 image data will be separated into ten groups, each with 60 data. The data from the nine groups were utilized as training data, while one group was used as test data. Figure 6 presents the process of partitioning the dataset using the kfold cross-validation approach.

G. Evaluation
The confusion matrix approach is used to evaluate the Backpropagation classification algorithm. This evaluation approach is currently being used to test two classes, but it can be changed to handle many class classifications. Table 1 shows the confusion matrix for 20 classes [29]. The accuracy formula presented in Equation (6) [30] is used to calculate the evaluation to determine the correctness of the data.
where Σ correct is the number of image data that are classified correctly, and Σ is the number of available image data.

III. Results and Discussions
Backpropagation architecture parameters used in the training process include architecture, number of neurons, activation function, learning rate value, momentum value, maximum iteration, and learning algorithm. Table 2 lists the requirements for these parameters.
In this study, there were various stages to the experiments. The first step is to figure out which normalization results are the best. Based on these findings, the next stage is to determine the best learning rate and momentum values. The number of neurons in the hidden layer is determined at the third stage. The number of input neurons will be changed in the fourth stage to discover the optimal zone in this study.

A. Determination of the Best Normalization Results
Each experiment has a set design at this point, with 16 neurons in the input layer, 20 neurons in the hidden layer, and 20 neurons in the output layer, with a maximum of 5000 iterations. The number of zones formed is equal to the number of input layers. A total of 16 zones were employed in this experiment. In the third stage, the value of the zone or input layer will be adjusted. There is a range of parameter values in each experiment. Raw data, data from the Min-max Normalization process, data from the Z-score Normalization process, and data from the Decimal Normalization process are all used. Experiments at this stage will use K-Fold Cross Validation with a K value of 10 until the fourth stage. Table 3 shows the findings of the first experiment.
According to Table 3, data normalized using the Z-score Normalization approach had the highest overall accuracy. The range of values produced using Z-score Normalization data is 56.33% to 60.00%. This value range is higher than the accuracy results produced from raw data, data adjusted using the Min-Max approach, and data standardized using Decimal Normalization.

B. Determination of the Best Learning Rate and Momentum Values
Each experiment has the same architecture and settings as the previous experiment. The data used has been adjusted using the Z-Score Normalization method. Table 4 shows the results of determining the best learning rate and momentum values. Table 4 shows that the best accuracy results are obtained when the learning rate is 0.003 with the momentum at 0.03. The accuracy of recognizing Javanese characters reaches 60.00% with these two values. The experimental stages of calculating the number of hidden neurons will be carried out using this learning rate and momentum value.

C. Determination of the Number of Hidden Neurons
The third step is to determine how many neurons are in the hidden layer. The number of neurons used varies between 20, 30, and 40. The learning rate parameter is 0.003, the momentum parameter is 0.03, and the iterations are 5000. Table 5 shows the results of the experiment. Table 5 shows the maximum result of 64.00%, achieved with as many as 40 neurons in the hidden layer. The test findings of various network architectures are still of low value, as can be shown. Therefore, increasing the number of input neurons can help enhance backpropagation testing outcomes [31]. In this scenario, increasing the number of input neurons means increasing the number of zones, which provides more precise information about the network.

D. Determination of the Number of Input Neurons
In the experiment, there are 40 hidden neurons in the backpropagation architecture. According to the zone partition plan, 16,25,36, and 64 input neurons are used. The best learning rate and momentum levels are 0.003 and 0.03. Table 6 shows the results of determining the number of input neurons. Table 6 shows that the accuracy improves when the input neurons are increased up to 64 neurons. When the number of input neurons is increased to 100, the accuracy attained drops to 73.00%. It validates that the highest accuracy results are achieved with 64 input neurons, which is 77.00%. Low test scores can be caused by a lack of diversity in handwritten Javanese writing patterns used as training data.

E. Determination of the Number of K in K-Fold Cross-Validation
The last experiment was to identify the amount of K in K-Fold Cross-Validation. K's value will be adjusted to 5, 10, and 15, respectively. Table 7 shows the results of the K number on the K-Fold Cross-Validation.
According to Table 7, the test with a K number of 5 has a 75.17% accuracy, whereas the test with a K number of 15 has a 76.67% accuracy. These two results are low compared to the accuracy results achieved when the K value is 10. The accuracy rating can reach 77.00% with a K value of 10.

F. Evaluation
Based on the previous experimental stages, the maximum accuracy is obtained using a network architecture that comprises 64 input neurons, 40 hidden neurons, and 20 neurons in the output layer. The learning rate is 0.003, the momentum is 0.3, and the number of iterations is 5000. Also, the test uses K-Fold Cross Validation with a K value of 10. The confusion matrix in Figure 7 is used to analyze the categorization findings from the testing stage.
The accuracy value for 20 classes is calculated by adding all the correctly predicted data and dividing by the number of data tested using Equation 6. As a result of the evaluation, the resulting accuracy value is 462/600 × 100% = 77.00%, with a neuron architecture of 64-40-20.

IV. Conclusion
There are multiple stages to creating the backpropagation architecture, including identifying the appropriate learning rate and momentum values, number of hidden layers, and number of input neurons. The network architecture includes 64 input neurons, 40 hidden neurons, and 20 neuron outputs, achieving the highest accuracy of 77.00%. The learning rate is 0.003, the momentum is 0.03, and the number of iterations is 5000. The recognition accuracy is affected by increasing the number of input neurons (in this case, adding zones) in the backpropagation architecture. Variations in Javanese script handwriting patterns that are more reproduced as backpropagation training data and research data development utilizing Javanese scriptwriting in the form of words or sentences can be used for future research.

Author contribution
All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper.