Melanoma Classification based on Simulated Annealing Optimization Neural Network

Article history: Submitted 15 December 2021 Revised 22 December 2021 Accepted 28 December 2021 Published online 31 December 2021 Technology development in image processing and artificial intelligence leads to the high demand for smart systems, especially in the health sector. Cancer is one of the diseases with the highest mortality cases worldwide. Melanoma is one of the cancers commonly caused by high exposure to UV light. The earliest the melanoma is identified, the higher the patient's chance of recovering. Therefore, this study proposes melanoma detection based on BPNN optimized by a simulated annealing algorithm. This research utilizes PH2 dermoscopic image data containing 200 color digital images in BMP format. The data is processed using color feature extraction techniques to identify the characteristics of each image according to the target data. The color space extraction includes mean RGB, HSV, CIE LAB, YCbCr, and XYZ. The evaluation result showed that the BPNN-SA increased the performance accuracy in classifying skin cancer compared to the original BPNN, with an overall average accuracy of 84.03%. This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/).


I. Introduction
Cancer appears because of the uncontrollable growth of abnormal cells in the human body. Cancer can occur in many parts of the human body, depending on their life habit. Global Cancer Observatory (Globocan) reports 18.1 million cancer cases around the world, with more than 9.1 million cases categorized as mortality cases in 2018 [1]. Moreover, from the same report provider, in 2020, the total cancer cases increased to 19.3 million, where about 10 million were classified as deaths cases [2]. Due to these reports, it can be summarized that cancer cases are increasing and spreading globally every year.
One cancer disease that commonly arises in the country with a high UV (Ultraviolet) light index is skin cancer [2]. Skin cancer is often caused by high exposure to ultraviolet light such as sunlight directly to the skin. Based on their invasion time and damage level towards the body, skin cancer can be divided into two types which are melanoma and nonmelanoma. Melanoma is categorized as malignant skin cancer that can be threatened human life [3]. The invasion state can be identified from the appearance of pigment cells called melanocytes in the form of dark skin lesions. However, the skin lesion color can differ in several cases depending on the number of changed pigment cells [4]. Unfortunately, benign skin cancer has a similar appearance compared to malignant. Thus, it is crucial to identify whether the skin lesion is melanoma or benign early. Furthermore, misclassification of skin cancer can lead to severe clinical outcomes.
Many researchers have been conducted several studies based on the advanced technology in images processing and artificial intelligence fields to identify cancer occurrence in early stages. The earliest the cancer was identified, the higher the patient recovered. Gautam  Technology development in image processing and artificial intelligence leads to the high demand for smart systems, especially in the health sector. Cancer is one of the diseases with the highest mortality cases worldwide. Melanoma is one of the cancers commonly caused by high exposure to UV light. The earliest the melanoma is identified, the higher the patient's chance of recovering. Therefore, this study proposes melanoma detection based on BPNN optimized by a simulated annealing algorithm. This research utilizes PH2 dermoscopic image data containing 200 color digital images in BMP format. The data is processed using color feature extraction techniques to identify the characteristics of each image according to the target data. The color space extraction includes mean RGB, HSV, CIE LAB, YCbCr, and XYZ. The evaluation result showed that the BPNN-SA increased the performance accuracy in classifying skin cancer compared to the original BPNN, with an overall average accuracy of 84.03%. melanoma classification based on the local binary pattern (LBP) and their variant. The evaluation was performed through several machine learning methods such as decision tree (DT), K-nearest neighbor (KNN), support vector machine (SVM), and random forest (RF). The result shows that the RF method achieved the best performance with 80.3% accuracy.
Moreover, other research proposed by Zghal and Derbel [6] utilized PH2 dermoscopic image dataset as the source for developing Computer-Aided Diagnosis (CAD). The proposed method used Asymmetry, Border, Color, and Diameter (ABCD) rules as features extraction from the dataset. The classification of skin cancer based on the PH2 dataset has proposed the combination of color and texture extraction [7]. The features consist of five color spaces: RGB, HSV, LAB, XYZ, and YCbCr. The features were used as input for three different classifiers: K-Nearest Neighbor, Support Vector Machine, and Neural Network.
Furthermore, some research applies the optimization method to improve the recognition performance. The use of the same data collection also proposed the skin cancer segmentation based on Fuzzy C-Mean Clustering and skin cancer detection using integrated ANN and Differential Evolution (DE) algorithm as the training optimization method [8]. The proposed method used multifeature extraction such as Red Green Blue (RGB), Local Binary Pattern (LBP), and Gray Level Cooccurrence Matrix (GLCM). The evaluation of this proposed method generates 97.4% of accuracy, which refers to the optimization of ANN using the DE algorithm to detect skin cancer effectively.
This study proposed skin detection based on color features extraction and an optimized neural network through simulated annealing (SA) algorithm. SA may find the global solution using a randomized approach. Moreover, SA optimizes adaptive neuro-fuzzy inference system (ANFIS) outperforms other optimization methods such as hyper-box (HB), backpropagation (BP), and genetic algorithm (GA) with 96.28% of accuracy [9]. Furthermore, this study's proposed color features extraction consists of several color spaces: RGB, HSV, CIE Lab, YCbCr, and XYZ. Then, the proposed classifier implemented a backpropagation neural network (BPNN) in which the weight in each synapse has been optimized using simulated annealing (SA). All of these proposed methods will be evaluated using the PH2 dataset. This study will be presented in four sections. The second section describes the proposed methodology and theoretical foundation. Moreover, the third section explains the result of the proposed method, and the last section concludes the proposed work.

II. Methods
This study proposed skin detection based on color features extraction and an optimized neural network through simulated annealing (SA) algorithm, BPNN-SA. This research utilizes the PH2 Dermoscopic Image Dataset provided by the Dermatology Service of Hospital Pedro Hispano in Matosinhos, Portugal, and the Universidade do Porto, Ecnico Lisboa [10]. This dataset contains 200 images of skin lesions which separated into 160 benign images (Common nevi and Atypical Nevi) and 40 melanoma images. The image collection is saved in BMP format with 768×560 pixels. Figure 1 shows the samples of the PH2 dataset. RGB is a color space often used in many digital devices such as smartphones, cameras, television, and computers [11]. RGB consists of three layers, where each layer represents the color of red, green, and blue. This color space used 8 bits system for color determination. The image pixel level was distributed between 0 to 255 in each layer. This distribution defines that the lower the pixel number, the darker the color. Other than being used as color representation in digital devices, RGB color space is often used as a color feature in several studies, such as fruit classification [12], image segmentation [13], and object detection [14].
HSV stands for hue, saturation, and value. Unlike RGB, in which the system represents each layer of color, each term in HSV has a specific function in image representation. The Hue value determines the color temperature of the images, saturation represents the color domination in the images, and the value represents the brightness level. HSV is also often used as a color feature [15] [16].
YCbCr is a color space that can be broken down into three components: Y, Cb, and Cr. Each component has a different function, where the Y component is luma or the color brightness, then Cb is the blue-green component, and Cr is the red-green component. Commonly, this color space was used in digital video processing [17].
Commission International de I'Eclairage (CIE ) proposed XYZ and Lab colors space. XYZ was proposed in the 1920s and has been used as a graphics standard at the moment [18]. Meanwhile, the Lab color space was proposed by CIE in 1976. This color space was proposed because the color was designed as close as the human vision for color [19]. Moreover, the Lab value was distributed from negative to positive, representing a different color.
Afterward, the extracted features were divided using cross-fold validation to generate data training and data testing. The use of cross-validation is to evaluate the capability and the robustness of the proposed model to process the unknown data. The data training was used as a reference to training the Backpropagation Neural Network (BPNN). Backpropagation Neural Network or BPNN is one of multi-layer perceptron-based artificial neural networks. BPNN has a similar structure to MLP and has an input, hidden, and output layer, as shown in Figure 3.
The difference is the learning transfer process where BPNN propagates back and forth [21]. Moreover, BPNN is also known as backward propagation error, where it uses gradient function to identify the class of the data [22]. The result of BPNN is the trained network model. From this trained network model, the weight of each synapse was extracted. The extracted weight will then be optimized using a simulated annealing (SA) algorithm. SA is a metaheuristics searching method that resolves the problem iteratively based on the specific objective function. This algorithm was inspired by the annealing process in metallurgy industries [23]. SA imitates the crystallization process of liquid metal where the process begins with heating the metal until it reaches the exact temperatures desired, then proceed the cooling process gradually and controlled until the metal can keep its optimal form [24]. This weight is considered as initial input in simulated annealing. The searching process generates random interference that moves the "particle" to cool down the initial temperatures. Accept the movement if the "particle" position has a lower energy state. This searching process will last until the process exceeds the defined iteration or the boundary of error rate is fulfilled. The final output was a weight that had been optimized. This weight is then assigned to the previous network model to replace its original weight. Finally, the final model was evaluated using test data to determine the performance of the proposed model.

III. Experiment Result and Discussion
The skin lesions identification based on BPNN and Simulated annealing optimization have been conducted. The PH2 dataset was used as a reference in the model development. The features for classification have been extracted from PH2 images using RGB, HSV, YCbCr, XYZ, and Lab color spaces. The extraction process produces fifteen features for each skin lesion image. From these features, cross-validation has been applied to generate the training set and testing set. The number of folds used in this research is between 2 to 10. The use of cross-validation is to evaluate the model's capability in handling unlabeled data. Table 1 shows the initial configuration of BPNN.
After the BPNN has been configured, then the training set obtained from cross-validation was fed into the neural network model as a training process. The neural network produced by the training Fig. 3. BPNN structure with one hidden layer [20] process is then extracted to obtain the weight of each synapse. These weight sets were used as the initial state of the optimization model based on the simulated annealing (SA) algorithm. Table 2 shows the specification of the Simulated Annealing.
The optimization aims to obtain the optimized weight for the BPNN model. After these sets of optimized weight were found, it was assigned to the BPNN to replace the original weight. The evaluation was conducted by evaluating the testing set into the optimized model. The proposed model was compared to the original BPNN to describe the difference created by the optimization algorithm from the evaluation process. Table 3 shows the accuracy of the original BPNN. Table 3 presents the original BPNN for each fold in cross-validation. The ninth fold achieved the highest accuracy with 83.83% of accuracy. Meanwhile, the tenth fold achieved the lowest accuracy with 68% of accuracy. Overall, the original BPNN could identify skin cancer with 79.51% of the total accuracy average. Moreover, the evaluation result of the proposed optimized model can be seen in Table 4. The result shows that all of the fold numbers outperformed the performance of the original BPNN. In the BPNN-SA, the sixth fold achieved the highest accuracy with 88.38%, while the lowest was the third fold with 81.81% accuracy. In addition, the result shows that the performance of BPPN-SA reached 84.03% of overall average accuracy. This achievement can be reached due to the capability of simulated annealing in searching the global minima, which is the minimum value of the fitness function. Moreover, the simulated annealing also used a randomized approach is generated the global solution, which theoretically has a broader probability of finding the best solution [9].  The comparison graph between the original BPNN and BPNN-SA can be seen in Figure 4. From this graph, it can be seen that almost in every fold, the BPNN-SA outperforms the original BPNN, especially the sixth and tenth folds, which significantly differ from the original BPNN accuracy result. This case happened due to the BPNN-SA, which utilized the simulated annealing as searching function relies on the randomized approach, which has a broader chance of finding the best solution compared to the original BPNN, which relies on the gradient-based search that has a tendency to be trapped in local minima [25]. This result referred to the simulated annealing algorithm as capable of improving the performance of BPNN in classifying skin cancer.

IV. Conclusion
The proposed improved skin cancer classification using BPNN and Simulated Annealing methods have been carried out. This research utilizes PH2 dermoscopic image data containing 200 color digital images in BMP format. The data is processed using color feature extraction techniques to identify the characteristics of each image according to the target data. The color space extraction includes mean RGB, HSV, CIE LAB, YCbCr, and XYZ. The experiment was conducted using the cross-fold validation method to evaluate the model robustness toward the appearance of unknown data. The BPNN model was first trained using a training set; then, the trained weight was obtained as the initial weight for simulated annealing. The simulated annealing was used for searching the optimal weight for the BPNN model. The evaluation result showed that the BPNN-SA method increased the performance accuracy in classifying skin cancer compared to the original BPNN method, with an overall average accuracy of 84.03%.