Deep Learning for Multi-Structured Javanese Gamelan Note Generator

ABSTRACT

convey, and laras refers to the scales used in that song.Dynamics, on the other hand, emphasizes the variety, balance, and dynamic nature of a song's musical components [2] [3].
Song structure, a particular karawitan art form, uses music as a symbolic medium to represent various aspects [4] [6].The goals of song are to be complex, to entertain the audience, and to convey a range of social, moral, cultural, and spiritual values [5].Incorrect performance of musical techniques in a song composition can lead to the loss of its aesthetic value and unique characteristics.In order to perform Javanese gamelan well, it is necessary to have an understanding of both the rules of gamelan and the emotional atmosphere conveyed by the piece of music being performed.
However, playing Javanese gamelan presents several challenges, especially in determining the playing pattern [3].As a result, assistance is required to facilitate the learning process of this cultural practice for future generations [3] [4].The aim of this study is to use technology to simplify the process of playing Javanese gamelan.
The size of a gendhing (song) can be determined by calculating the number of gatra in each gongan and the total number of gongan in the song [1] [2].Gendhing is further divided into three subtypes: ageng(big), sedheng (middle), and alit(small).Gendhing alit, consisting of sampak, srepeg, ayakayakan, lancaran, bubaran, ketawang, and ladrang, is the focus of this study [2].This categorization is based on the design of the ricikan struktural instrument groupings, which include the kenong, kethuk, kempyang, kempul, and gong.The arrangement of the ricikan struktural instruments is an important factor in notation that determines the composition of the musical piece [2] [6].The musical instruments known as kenong, kethuk, and kempul serve as breaks in the song, while the gong indicates the end of the song.
There are two additional groups of Javanese gamelan instruments in addition to the ricikan struktural instruments, which are a) ricikan balungan, which is a group of musical instruments that play the basic melody of a song, such as slenthem, demung, saron, and peking; and b) ricikan garap as musical accompaniment like ricikan stuktural, which is a group of musical instruments that handle variations in song decoration, such as rebab, gender barung, gender penerus, bonang barung, bonang penerus, gambang, siter, and suling [2].
The configuration of musical pieces in Javanese gamelan is occasionally not only dependent on the composer's artistic expression but also matches standard notational conventions.Consequently, in order to perform a piece in Javanese gamelan, it is necessary to commit to memory the patterns of each composition's song structure, as complete notation for all gamelan instruments is not always provided.The Javanese gamelan notation generally consists of only the primary melody, thereby necessitating a high level of expertise among gamelan musicians to execute all the instruments.Nonetheless, this presents a difficulty for inexperienced musicians who require comprehensive notation for every instrument to perform gamelan music.Figure 1 illustrates the structure of a Javanese gamelan composition.Gamelan sheet music, as depicted in Figure 1, only displays balungan notation (note) and omits the notation of the other two groups of instruments, ricikan struktural and ricikan garap.This notation is typically used by gamelan players to perform karawitan, along with other information about the piece, such as the song's structure type, rhythm type, and information about the laras and pathet.Laras and pathet refer to musical scales and modes of the song.Figure 2 illustrates the ricikan struktural instruments used in the composition of a song [1], [2], [8].These instruments include gong ageng, gong suwuk, kenong, kempul, kethuk, and kempyang, as shown in Figure 2. The position of these instruments within a song distinguishes different types of song structure.The gong ageng denotes the longest cycle of a song, while the gong suwuk is used in all song structures except the ketawang and ladrang forms, where it is replaced by the kempul.The kenong divides the flow of the gendhing into musical phrases of equal length.The kempul, which is a smaller gong, often interlocks with the kenong in forms such as lancaran, ketawang, and ladrang.The balungan represents the melody notes of each song, which are divided into several lines, each line containing several gatra, each of which is made up of several notes.(1) kenong occurs on the last note of each gatra (also known as dhong gedhe), and the note always matches that of the dhong gedhe; (2) Kempul occurs on the second note of each gatra (also known as dhong cilik), and there are only three kempul notes.The first gatra has no kempul note; (3) Kethuk (+) is played on the odd notes of each gatra; (4) Gong suwuk is played at the end of the fourth gatra.Rhythm (Irama) refers to the tempo and rhythm in gamelan music.There are five types of rhythm, including Irama Lancar, Irama Tanggung, Irama Dadi, Irama Wilet, and Irama Rangkep.A song is typically presented in different rhythms [5], such as the Lancaran Manyar Sewu song, which can be presented in both the Irama Lancar and Irama Tanggung forms.In this case, the rhythm has a significant impact on the way the song is performed.
Currently, discussions of the types of gendhing patterns focus mainly on artistic and ethnomusicological perspectives.For example, studies have examined the kempul pattern in gendhing alit in Klenengan music [6], the kenong instrument pattern in karawitan style aesthetics [7], and the role of ricikan struktural as one of the indicators in gendhing formation [8].However, the relationship between gamelan music and technology, especially Deep Learning (DL), has received little attention.The purpose of this study is to use DL to assist novice gamelan musicians in understanding the ricikan struktural components.This study is known as part of the music generation.
The integration of DL technology with the art of music has contributed to the development of music generators capable of creating new and unique musical compositions [9].In recent years, the field of music composition has seen significant development due to the development of advanced deep learning techniques such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM).
The CNN is a special type of deep learning that has been used in the field of music composition.An example of this phenomenon is the creation of new music using audio-based music, such as MIDI [10], or symbolically represented music [11] in alternative formats.The use of CNN represents a contemporary advancement in the field of music.The use of CNN has been widely implemented in the field of image classification [12].The previously discussed networks are purposefully constructed to detect and extract identifiable patterns and features from visual data [13].Similar methods are used to train these networks for the purpose of recognizing patterns and features in musical sequences.In previous research related to music generation, CNN were reliable in obtaining the semantic features of music [14] and multiple feature extraction [15].
CNN is often integrated with other deep learning techniques, such as LSTM, to generate complex and sophisticated musical compositions [13].The LSTM network is a variant of the Recurrent Neural Network (RNN), which is able to effectively capture long-term temporal dependencies in time-series data, including musical sequences.In previous studies, LSTM has been widely used for music generation because it is suitable for learning patterns from sequential music data [16] [17].
The combination of CNN and LSTM networks produces both short-term and long-term musical patterns, resulting in more authentic and rationally structured music [13] [18].CNN-LSTM has several advantages, including the ability to perform temporal analysis while extracting abstract features [19], and it outperforms standard machine learning algorithms in terms of stability, accuracy, and prediction [20][21] [22].In music generation, Convolutional LSTM outperforms LSTM with more pronounced waveforms and clearer melodies [18].It combines the advantages of CNN, which can extract effective features from data music sequences, and LSTM, which can not only discover data interdependence in time series data, but also automatically detect the ideal mode suitable for relevant data to build new sequences [23].
However, previous research on music generation using the CNN-LSTM combination is limited to the generation of new melodies in Turkish pop music with a certain style [13] and modern music [18] from MIDI files.In this study, the same approach is used to generate music notation for several instruments based on variations in the structure of Javanese gamelan songs using notation-based music datasets.However, the difference with the previous research is that this study uses a dataset with more readable notation represented as numerical notes in text format.And the focus of this study is to generate musical accompaniment for multiple instruments.In the context of gamelan music, CNN and LSTM have been used to create musical compositions that follow the rules and conventions of traditional gamelan music.The ability of the CNN network is used to extract important features from the input parameters fed into the network, such as balungan notation, rhythm, and gatra information.The LSTM network is then used to generate the notation of several ricikan struktural instruments as a musical accompaniment to the melodic notation of the balungan instrument, with the ability of the LSTM to model temporal dependencies.
According to the above statement, the issues covered in this study are: • Writing complete notation, especially for ricikan struktural instruments, is very helpful for novice gamelan players.• The notation patterns of the ricikan struktural instruments have different variations, so it will be more convenient for novice gamelan players to play a gamelan song based on the structure of the song, where the function of the notation pattern of the ricikan struktural instrument is used as the structure for a song.
This study aims to automatically generate notation for several instrument groups, including kenong, kethuk, kempyang, kempul, and gong, using CNN-LSTM.The features used in this study include the main melodic notation of the balungan instrument, rhythm, and gatra information.The main contributions of this study are presented below: • A dataset of Javanese gamelan music was created based on symbol notation.• The use of numerical notes as a simplified method of representing musical data as input.
• This study effectively generates musical accompaniment for various musical instruments, including kenong, kethuk, kempyang, kempul, and gong, by incorporating song characteristics such as song structure, gatra, and rhythm.
• To help the general public understand the various patterns of song structures and their notation for the ricikan struktural instrument groups.
The remaining sections of this paper are organized as follows: Section I presents the introduction and related work.Section II describes the methodology, including the details of the dataset and the proposed model.Section III presents the experiments and results.Finally, Section IV provides the conclusion of the paper.

II. Method
The objective of this study is to use CNN-LSTM to create an automatic notation generator for the ricikan struktural instrument.The technique in this study uses CNN for feature extraction and LSTM as the notation generator.The detailed steps for implementing the proposed method are discussed in this section.

A. Dataset
The present study employed symbol-based data, specifically numerical notes, sourced from a collection of multiple songs available at http://www.gamelanbvg.com,for the music dataset.The data extracted from musical compositions includes the song's notation as well as its distinctive features, such as gatra details, rhythmic patterns, and song structure composition.Furthermore, an annotation of certain ricikan struktural instruments designed by a specialist in gamelan from Soewidiatmaka Gamelan has been incorporated into the dataset.
A total of 35 songs were used in this study.These are divided into seven song structures, with five songs in each structure.The various ricikan struktural instruments and the notation for the balungan were arranged according to the gatra of each song.The balungan is often represented by four notations in one gatra.As a result, the dataset used in the current study contains approximately 600 gatra distributed across the 35 songs, as shown in Figure 4.In this dataset, 28 songs were used for training (80% of the data) and validation (20% of the data), and 7 songs were used for testing.The songs used in this study are listed in Table 1, where the table lists the song titles used as datasets with the type of song structure, type of laras (scale of the song), type of pathet (mode of the song), and the rhythm contained in the song.

B. Preprocessing Data
The input data of this study consists of balungan notation, rhythm type, song structure type, and gatra information, while the output data consists of ricikan struktural music notation such as kenong, kethuk, kempyang, kempul, gong ageng, and gong suwuk.Preprocessing of both input and output data using one-hot encoding techniques [39], which involves converting both input and output data into binary form with careful consideration of the respective data, Figure 5 shows the preprocessing result of one-hot encoding.Before the input is fed into the CNN-LSTM network, a one-hot encoding process is performed on each input, which consists of balungan notation arranged in each gatra, rhythm, song structure, and gatra information from this note.After the encoded vector input is combined into an input sequence, it is ready to be fed into the CNN-LSTM architecture network.

C. CNN-LSTM
The following section provides a detailed description of the structure of the CNN-LSTM architecture model.The diagram in Figure 6 shows the different steps of this study.The proposed CNN-LSTM model consists of three main components: a Convolutional Neural Network (CNN), a Long-Short-Term Memory (LSTM) network, and a fully connected layer.
• The CNN is used to obtain a feature representation of the input music sequence, which consists of balungan notation divided into gatra, rhythm, song structure, and gatra information from this note.This CNN network consists of a 1D convolutional layer with 32 filters and a kernel size of 2, with padding set to the same size.This is followed by an activation layer using RELU and a 1D max-pooling layer.• The LSTM component is responsible for modeling the temporal dependencies between the extracted features and generating musical accompaniment sequences.It consists of a single-layer LSTM with 128 hidden units and a dropout layer with a size of 0.2 to avoid overfitting.• The fully connected layer and the output layer use a sigmoid activation function for each ricikan struktural instrument to predict the musical accompaniment.To provide a comparative analysis, we compared the performance of the CNN-LSTM model with that of the CNN and LSTM models.The architectural details of each model are shown in Figure 7.

D. Evaluation
As the first evaluation for this study, we investigated the effectiveness of our proposed CNN-LSTM model in predicting musical accompaniment notes for various ricikan struktural instruments.We compared its performance with that of CNN and LSTM models.To evaluate the performance of the CNN-LSTM model, we compared its predictions with the ground truth labels or desired outputs (the original notation from the gamelan composer).By applying the model to a specific dataset and comparing its predictions with the actual results, we were able to determine the exact values of accuracy, precision, and recall [40].The second evaluation involves applying the second scenario with different song structures by selecting a single song that is not included in the training data for each song structure.The notation generated by the song generator is then compared to the original version using music analysis methods such as note distance.In this evaluation phase, a detailed assessment of the predictive ability of the proposed model for musical accompaniment is expected.

III. Result And Discussion
This section focuses on the evaluation of the performance of the proposed CNN-LSTM model and the assessment of the generated results, with the ultimate goal of providing accompaniment music notations for different types of ricikan struktural instruments.The evaluation was divided into two scenarios: intensive experiments with the same song structure and experiments with different song structures.
In the first scenario, several intensive experiments were conducted to evaluate the overall performance of the model on datasets of the same type.The goal is to see how well the model performs when the song structure remains consistent throughout the test period.
In contrast, in the second scenario, the experiment was conducted by evaluating the model's performance on datasets with different types of song structure.The goal of this scenario is to evaluate the adaptability and generalizability of the model across different forms of song structure.This was intended to assess the model's ability to accurately generate musical accompaniment notes across a range of ricikan struktural instruments.

A. Quantitative Analysis
The results of the quantitative analysis of the performance of each model in the two scenarios are summarized in Table 2.The results show that the CNN-LSTM framework exhibits superior performance compared to the LSTM and CNN models in all evaluated scenarios, regardless of whether the song structures used are the same or different, as seen from the accuracy, precision, and recall values.The CNN-LSTM model has higher accuracy, precision, and recall values compared to the CNN and LSTM models.A high accuracy score indicates better model performance.A high precision value indicates fewer false positives.And a high recall value indicates fewer false negatives.
Model performance with high values in the first scenario in Table 2 (Accuracy = 91.9;Precision = 92.3;Recall 91.8) will affect the generator results of the ricikan struktural instrument notation, i.e., the result of the CNN-LSTM model generator will be more similar to the original when compared to the generator results of CNN and LSTM.This will be discussed in more detail in the Music Generation Results section.While the difference in performance accuracy between the three models is comparatively small, fluctuating between a positive 0.2 and 1.2.Regarding the accuracy of the first scenario, the CNN-LSTM model achieved 91.9, while the CNN and LSTM models achieved 91.2 and 91.5, respectively.Furthermore, the second scenario tends to produce better performance results due to the homogeneity of the data used in the first scenario.
The CNN-LSTM model offers a remarkable advantage by integrating the advantageous features of both the CNN architecture, which is great at feature extraction, and the LSTM architecture, which is excellent at modeling temporal dependencies.The integration of CNN and LSTM in the model enables it to handle both micro-and macro-level musical patterns proficiently, leading to the generation of more precise and expressive musical accompaniment.

B. Music Generation Result
This section evaluates the notation generators used by analysis tools.Test data from each song structure in the second scenario, which has different song structures, will be used.The goal of this evaluation is to assess how closely the output of the generator resembles the composition provided by the gamelan composer.The evaluation criterion used in this evaluation phase is the measure of note distance.Note distance is a metric used to quantify the similarity between the generator's output notation ( 2 ) and the original notation ( 1 ) of a gamelan composer's creation.This distance, also referred to as the exact distance, is represented by a binary representation as written in (1).
The proposed approach, CNN-LSTM, was evaluated with a comparative analysis compared to CNN and LSTM.This evaluation was done by calculating the note distance for each instrument in each song structure.Furthermore, an in-depth analysis was conducted to investigate the relationship between input parameters such as balungan notation, song structure, rhythm, and gatra information and the output notation generated on various ricikan struktural instruments such as kenong, kethuk, kempyang, kempul, gong suwuk, and gong ageng.
Table 3 shows the note distance values for each instrument for the ricikan struktural of various song structures.The results indicate that the CNN-LSTM approach produced notations with the lowest note distance values compared to LSTM and CNN.A decrease in the note distance value indicates an increase in the degree of similarity between the notations provided by the gamelan composer's musical composition.The results of this study indicate that the CNN-LSTM model outperforms both the LSTM and CNN models in terms of improving overall performance, as it effectively exploits the strengths of both CNN and LSTM.The kempyang instrument is only present in the ketawang and ladrang song structures, and has no notation in other song structures.Table 3 shows that the kethuk, kempyang, and gong ageng instruments have a note distance value of 0. As a result, the generated notation from all three models across different song structures is very similar to the gamelan composer's original notation.The fixed notation patterns of each instrument within the song structure contribute to this similarity.Specifically, the kethuk instrument has a consistent notation pattern of (+), which represents a hit, while the kempyang instrument has a consistent notation pattern of (-), which also represents a hit.These instruments have no variations in tone.In addition, the gong ageng instrument serves as an indicator of the end of the song, so its notation pattern remains constant without any variations.Figure 8 shows visual representations of the notation patterns for kethuk and kempyang in each song structure.In Table 3, both the kenong and kempul instruments show variations in note distance.The kenong instrument tends to have note distances close to 0 for CNN-LSTM model, indicating a close resemblance between the generated notation and the original.The notation pattern on the kenong instrument seems to be more consistent across different song structures compared to the kempul instrument.On the other hand, the note distance values for the kempul instrument show various variations.A value of 0 means that the generated notation is very close to the original.It should be noted, however, that in the case of sampak, there is a tendency for higher note distance values compared to other song structures.This is due to the notation pattern in sampak, where the notation for the kempul instrument does not always match the balungan notation.Such variations in the notation pattern are intentional and are often introduced by gamelan composers to add diversity and variation to the music.Figure 9 and Figure 10 show the output of the notation generators using three models: the CNN-LSTM, LSTM, and CNN methods for multiple instruments in the ricikan struktural within the sampak and bubaran song structures.By observing these figures, we can examine the relationship between the input components, including balungan notation, song structure, rhythm, and gatra information, and the output notation of multiple instruments in the ricikan struktural.The following observations are possible: • The notation for instruments such as the kenong, kempul, gong suwuk, and gong ageng is derived from the balungan notation within each gatra.However, the order in which the notes are taken is different for each instrument.For example, in srepeg, the notes for kenong are taken from the 4th tone of each gatra, whereas in ketawang, the last note of the even gatra is chosen.• Song structure and rhythm determine the notation pattern for all instruments, including kenong, kethuk, kempyang, kempul, gong suwuk, and gong ageng, within each song form.• Gatra information is used to determine the position of the notation for instruments such as gong suwuk, gong ageng, kenong, and kempul.However, the situation is different from what is shown in Figure 10, where the generator results of the CNN-LSTM, LSTM, and CNN models do not match the original Kempul notation for many Kempul notations.The same notation is not shown in Figure 10, while the different notation is highlighted in yellow for the CNN-LSTM model generator results, green for LSTM, and blue for CNN.In the Sampak Tlutur Slendro Manyura test data, there are differences in the notations generated on the kempul and gong suwuk instruments.The notation for the kempul and gong suwuk instruments is usually derived from the balungan notation, but sometimes the composer substitutes variations of the notation that are different from the balungan notation.For example, in the 3rd gatra of the first line, the 5th note of the balungan becomes the 2nd note of the kempul.The generator results of CNN and LSTM are different, while the proposed method CNN-LSTM are the same notation as the original.This is consistent with the results shown in Table 3, where the note distance value for the Kempul instrument is smaller compared to the two models of CNN and LSTM for the Sampak song structure type.
Based on the results of the music notation generator shown in Table 3, Figure 9, and Figure 10, shows that the CNN-LSTM model can produce a notation generator that is more similar to the original (notation that is the creation of gamelan experts).With the ability of CNN in extracting important features from the input fed into the model and supported by the ability of LSTM in predicting music notation from previously learned patterns.However, in Table 3 and Figure 10, there are still some notations that are different from the original, this may still be a rule of gamelan notation, especially the Kempul instrument, which has not been used as a feature in the proposed comparison model.The results of this study can be useful in the field of education, especially for novice gamelan players, in playing ricikan struktural instruments, because in gamelan songs there is only melody notation.The notation pattern of ricikan struktural instruments can be identified by the title of a song in Javanese gamelan, because in the title there is a structure of the song that affects the notation pattern of ricikan struktural instruments.In addition, this study is also useful in the field of gamelan art, with the creation of an automatic generator of ricikan sruktural instrument notation, it can be used to compose an automatic musical composition on a Javanese gamelan song as an accompaniment to melody notation.
The limitation of this research is that it only generates the notation of ricikan struktural instruments, it still needs to be combined with other instrument notations, such as the notation of ricikan garap instruments as song decorators and kendang instruments as rhythmic controllers.In order to improve the results more optimally, further investigation is needed, especially in relation to the rules on the kempul and gong suwuk instruments and its correlation with a song in Javanese gamelan, because the results of this study still have some notation patterns that do not match the original, especially for the kempul and gong suwuk instruments.

IV. Conclusion
This study concludes that CNN-LSTM, LSTM, and CNN models can effectively predict musical note generation for multi-instrument ricikan struktural Javanese gamelan.Experimental results show that CNN-LSTM outperforms LSTM and CNN in terms of accuracy, recall, precision, and quality of generated notations.This superiority can be attributed to the combination of the strengths of both models, resulting in improved performance.
The more homogeneous data scenario yields higher accuracy scores due to the consistent distribution of the same data, resulting in more consistent pattern generation.Note Distance, which measures the difference between the generator's notations and the composer's gamelan notations, shows that the third generator model (CNN-LSTM, LSTM, and CNN) produces similar notations to the original for instruments such as kethuk, kempyang, and gong ageng.However, instruments such as kenong, kempul, and gong suwuk show significant differences.
The small note distance value indicates a consistent notation pattern on the ricikan struktural instrument, which follows the balungan notation.However, the large note distance value indicates variation of pattern in the ricikan struktural instrument, which sometimes does not follow the balungan notation.This illustrates that consistency with standardized pattern rules does not always exist in Javanese gamelan, but sometimes gamelan composers change the notation of these instruments as a variation in playing gamelan music.
Although not all notations are exactly the same as the original, this method of music generation can still be used to supplement the notation in Javanese gamelan songs based on song characteristics such as the type of song structure, rhythm, melody (balungan) notation, and gatra information.
This study has benefited for novice gamelan players, especially in playing ricikan struktural, by creating an automatic ricikan struktural instrument notation generator.This can be used to create an automatic musical composition on Javanese gamelan songs, complementing the melody notation in gamelan songs.The study can also be applied to gamelan art.This study focuses on the ricikan struktural generators in Javanese gamelan, but also explores the ricikan garap and kendang instruments for next study.Future studies should look at the rules of the kenong and gong suwuk instruments and how they relate to the songs, as there are notation patterns in the study that still differ from the original, especially for the kempul and gong suwuk instruments.In addition, the wide variety of Javanese gamelan styles provides opportunities for further study.

Fig. 2 .
Fig. 2. Ricikan struktural in javanese gamelan Figure 3 is an example of the detailed structure of the gendhing lancaran form.Lancaran is a form of gendhing that has 4 gatra or 16 balungan notations on each gongan.There are usually four gongan in a lancaran composition.The pattern rules for lancaran are as follows:(1) kenong occurs on the last note of each gatra (also known as dhong gedhe), and the note always matches that of the dhong gedhe;(2) Kempul occurs on the second note of each gatra (also known as dhong cilik), and there are only three kempul notes.The first gatra has no kempul note; (3) Kethuk (+) is played on the odd notes of each gatra; (4) Gong suwuk is played at the end of the fourth gatra.

Fig. 7 .
Fig. 7. Architecture of (a) LSTM and (b) CNN for note generator for Multi-instrument

•
Accuracy measures the overall prediction accuracy of a model by determining the number of correctly predicted examples.Higher accuracy indicates better performance.• Precision is a metric that refers to the number of true positives correctly identified and the sum of true positives and false positives.An increase in precision results in a decrease in false positive accuracy.False positives indicate that the model predicts a positive outcome, but the actual outcome is negative.• The recall metric evaluates a model's ability to reliably detect all positive cases.A lower false negative rate indicates a higher recall score.False negatives indicate that a model predicts a negative outcome when the actual outcome is positive.

Fig. 8 .
Fig. 8. Pattern of kethuk and kempyang notation for each song structure

Fig. 10 .
Fig. 10.Notation of Sampak Tlutur Slendro Manyura, the colored notation is the result of a generator notation that differs from the original notation of the composer's gamelan (yellow section generated by CNN-LSTM, green section generated by LSTM, and blue section generated by CNN).

Table 1 .
List of songs for dataset in this study

Table 2 .
Performance Value of accuracy, precision, and recall for CNN-LSTM, LSTM, and CNN

Table 3 .
Value of note distance from three model CNN-LSTM, CNN, LSTM