Timbre Style Transfer for Musical Instruments Acoustic Guitar and Piano using the Generator-Discriminator Model

Widean Nagari, Joan Santoso, Esther Irawati Setiawan

Abstract


Music style transfer is a technique for creating new music by combining the input song's content and the target song's style to have a sound that humans can enjoy. This research is related to timbre style transfer, a branch of music style transfer that focuses on using the generator-discriminator model. This exciting method has been used in various studies in the music style transfer domain to train a machine learning model to change the sound of instruments in a song with the sound of instruments from other songs. This work focuses on finding the best layer configuration in the generator-discriminator model for the timbre style transfer task. The dataset used for this research is the MAESTRO dataset. The metrics used in the testing phase are Contrastive Loss, Mean Squared Error, and Perceptual Evaluation of Speech Quality. Based on the results of the trials, it was concluded that the best model in this research was the model trained using column vectors from the mel-spectrogram. Some hyperparameters suitable in the training process are a learning rate 0.0005, batch size greater than or equal to 64, and dropout with a value of 0.1. The results of the ablation study show that the best layer configuration consists of 2 Bi-LSTM layers, 1 Attention layer, and 2 Dense layers.

Full Text:

PDF

References


L. Gatys, A. Ecker, and M. Bethge, “A Neural Algorithm of Artistic Style,” J. Vis., vol. 16, no. 12, p. 326, Sep. 2016.

G. Brunner, Y. Wang, R. Wattenhofer, and S. Zhao, “Symbolic Music Genre Transfer with CycleGAN,” in 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI), IEEE, Nov. 2018, pp. 786–793.

C.-Y. Lu, M.-X. Xue, C.-C. Chang, C.-R. Lee, and L. Su, “Play as you like: Timbre-enhanced multi-modal music style transfer,” in Proceedings of the aaai conference on artificial intelligence, 2019, pp. 1061–1068.

H.-W. Dong, W.-Y. Hsiao, L.-C. Yang, and Y.-H. Yang, “MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment,” Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, Apr. 2018.

L.-C. Yang, S.-Y. Chou, and Y.-H. Yang, “MidiNet: A convolutional generative adversarial network for symbolic-domain music generation,” arXiv Prepr. arXiv1703.10847, 2017.

Z. Ding, X. Liu, G. Zhong, and D. Wang, “SteelyGAN: Semantic Unsupervised Symbolic Music Genre Transfer,” 2022, pp. 305–317.

G. Brunner, A. Konrad, Y. Wang, and R. Wattenhofer, “MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer.” Sep. 20, 2018.

O. Cifka, A. Ozerov, U. Simsekli, and G. Richard, “Self-Supervised VQ-VAE for One-Shot Music Style Transfer,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Jun. 2021, pp. 96–100.

S.-L. Wu and Y.-H. Yang, “MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer with One Transformer VAE.” May 09, 2021.

O. Cifka, U. Simsekli, and G. Richard, “Groove2Groove: One-Shot Music Style Transfer With Supervision From Synthetic Data,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 2638–2650, 2020.

Y.-N. Hung, I.-T. Chiang, Y.-A. Chen, and Y.-H. Yang, “Musical Composition Style Transfer via Disentangled Timbre Representations.” May 30, 2019.

S. Deepaisarn, S. Chokphantavee, S. Chokphantavee, P. Prathipasen, S. Buaruk, and V. Sornlertlamvanich, “NLP-based music processing for composer classification,” Sci. Rep., vol. 13, no. 1, p. 13228, Aug. 2023.

X. Xue and Z. Jia, “The Piano-Assisted Teaching System Based on an Artificial Intelligent Wireless Network,” Wirel. Commun. Mob. Comput., vol. 2022, pp. 1–9, Jan. 2022.

P. J. Donnelly and V. Ebert, “Transcription of audio to midi using deep learning,” in 2022 7th International Conference on Frontiers of Signal Processing (ICFSP), IEEE, 2022, pp. 130–135.

I. Goodfellow et al., “Generative adversarial networks,” Commun. ACM, vol. 63, no. 11, pp. 139–144, Oct. 2020.

B. Di Giorgi, M. Levy, and R. Sharp, “Mel Spectrogram Inversion with Stable Pitch.” Aug. 26, 2022.

S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.

A. Vaswani et al., “Attention Is All You Need.” Jun. 12, 2017.

Z. Guo, J. Kang, and D. Herremans, “A domain-knowledge-inspired music embedding space and a novel attention mechanism for symbolic music modelling,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2023, pp. 5070–5077.

R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality Reduction by Learning an Invariant Mapping,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06), IEEE, pp. 1735–1742.

D. Yao et al., “Contrastive Learning with Positive-Negative Frame Mask for Music Representation,” in Proceedings of the ACM Web Conference 2022, New York, NY, USA: ACM, Apr. 2022, pp. 2906–2915.

I. Manco, E. Benetos, E. Quinton, and G. Fazekas, “Contrastive Audio-Language Learning for Music.” Aug. 25, 2022.

J. Koo, M. A. Martínez-Ramírez, W.-H. Liao, S. Uhlich, K. Lee, and Y. Mitsufuji, “Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects.” Nov. 03, 2022.

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), IEEE, pp. 749–752.

J. Zhao and G. Xia, “AccoMontage: Accompaniment Arrangement via Phrase Selection and Style Transfer.” Aug. 25, 2021.

K. Radzikowski, L. Wang, O. Yoshie, and R. Nowak, “Accent modification for speech recognition of non-native speakers using neural style transfer,” EURASIP J. Audio, Speech, Music Process., vol. 2021, no. 1, p. 11, Dec. 2021.

S. Yuan, P. Cheng, R. Zhang, W. Hao, Z. Gan, and L. Carin, “Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning.” Mar. 16, 2021

Y. Zhang et al., “StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 19597–19605.

M. Koutsogiannaki, S. M. Dowall, and I. Agiomyrgiannakis, “Gender-ambiguous voice generation through feminine speaking style transfer in male voices.” Mar. 12, 2024.

M. Pasini, “MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms.” Oct. 08, 2019.




DOI: http://dx.doi.org/10.17977/um018v7i12024p101-116

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Knowledge Engineering and Data Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Flag Counter

Creative Commons License


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View My Stats