A Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) Approach for Identifying Potential Villages in Buleleng Regency

Dina Nur Amalina, Achmad Fauzan

Abstract


Buleleng Regency, located in Bali Province, possesses diverse village potential, including agricultural production and tourist attractions. However, this potential has not been fully optimized. Therefore, it is important to enhance village potential by clustering villages based on their specific characteristics to identify and prioritize those requiring special attention. This approach aims to promote equitable village development and reduce poverty levels. This study clusters villages in Buleleng Regency based on their potential using the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) method. The data utilized in this study comprises village potential data obtained from the Buleleng Regency Statistics Office (BPS) for all districts and the Statistical Service Information System. The variables used in this study are based on aspects of population, communication, tourism, trade, health, religion, social affairs, and public welfare. Tuning parameters were performed to determine the optimal parameters, resulting in optimal parameters, such as minimum cluster size = five and minimum samples = 2, which produced two main clusters. The first cluster comprises six villages, while the second includes 118 villages. Additionally, a noise cluster representing outliers, consisting of 24 villages, was identified. The findings indicate that the first cluster exhibits higher village potential than the second cluster. Based on these results, it is recommended that the government prioritize the second cluster when designing and implementing targeted programs and policies to reduce poverty by developing village potential.

Full Text:

PDF

References


Wikipedia, “Buleleng Regency.” Accessed: Sep. 01, 2024. [Online]. Available: https://en.wikipedia.org/wiki/Buleleng_Regency

I. Ghamarian and E. A. Marquis, “Hierarchical density-based cluster analysis framework for atom probe tomography data,” Ultramicroscopy, vol. 200, pp. 28–38, 2019, doi: 10.1016/j.ultramic.2019.01.011.

C. Malzer and M. Baum, “A hybrid approach to hierarchical density-based cluster selection,” in 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), IEEE, Sep. 2020, pp. 223–228. doi: 10.1109/MFI49285.2020.9235263.

A. C. A. Neto, J. Sander, R. J. G. B. Campello, and M. A. Nascimento, “Efficient computation of multiple density-based clustering hierarchies,” in 2017 IEEE International Conference on Data Mining (ICDM), 2017, pp. 991–996.

L. McInnes, J. Healy, and S. Astels, “HDBSCAN: hierarchical density based clustering,” The Journal of Open Source Software, vol. 2, no. 11, p. 205, Mar. 2017, doi: 10.21105/joss.00205.

D. and S. J. Campello Ricardo J. G. B. and Moulavi, “Density-based blustering based on hierarchical density estimates,” in Advances in Knowledge Discovery and Data Mining, V. S. and C. L. and M. H. and X. G. Pei Jian and Tseng, Ed., Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 160–172.

H.-P. Kriegel, P. Kröger, and A. Zimek, “Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering,” ACM Trans. Knowl. Discov. Data, vol. 3, no. 1, Mar. 2009, doi: 10.1145/1497577.1497578.

G. Stewart and M. Al-Khassaweneh, “An implementation of the HDBSCAN* clustering algorithm,” Applied Sciences, vol. 12, no. 5, p. 2405, Feb. 2022, doi: 10.3390/app12052405.

L. Wang, P. Chen, L. Chen, and J. Mou, “Ship AIS trajectory clustering: an HDBSCAN-based approach,” J Mar Sci Eng, vol. 9, no. 6, p. 566, May 2021, doi: 10.3390/jmse9060566.

M. Strobl, J. Sander, R. J. G. B. Campello, and O. Zaïane, “Model-Based Clustering with HDBSCAN,” 2021, pp. 364–379. doi: 10.1007/978-3-030-67661-2_22.

L. Zhang, X. Su, Y. Wang, M. Wang, X. Yang, and Z. Xu, “HDBSCAN-based semantic clustering model in classifying incidents on security and environmental conservation management,” in Ninth International Symposium on Advances in Electrical, Electronics, and Computer Engineering (ISAEECE 2024), P. Siano and W. Zhao, Eds., SPIE, Oct. 2024, p. 116. doi: 10.1117/12.3033910.

BPS-Statistics Indonesia Buleleng Regency, Buleleng regency in figures 2022. Buleleng: BPS-Statistics Indonesia Buleleng Regency, 2022.

R. Yolanda, “Hierarchical clustering with R,” https://medium.com/@yolandawiyono98/hierarchical-clustering-with-r-21da2b0881ca.

M. W. Puspitasari and M. Susanti, “Regencies/cities classification based on factors related to poverty in Central Java using ward method and average linkage,” Jurnal Kajian dan Terapan Matematika, vol. 5, no. 6, 2016.

K. H. Hidayatullah and others, “Analisis klaster untuk pengelompokan kabupaten/kota di provinsi Jawa Tengah berdasarkan indikator kesejahteraan rakyat,” Jurnal Statistika Universitas Muhammadiyah Semarang, vol. 2, no. 1, 2014.

SAS Institute Inc, “Inverse correlation matrix,” https://www.sfu.ca/sasdoc/sashtml/insight/chap40/sect21.htm.

A. Horsch, “Detecting and treating outliers in python — part 2,” https://towardsdatascience.com/detecting-and-treating-outliers-in-python-part-2-3a3319ec2c33.

Y. D. Mayangsari, “Analisis K-Means pada pengelompokan kabupaten-kota provinsi Jawa Timur berdasarkan kasus kesembuhan dan kasus kematian covid-19,” Universitas Islam Negeri Maulana Malik Ibrahim, 2022.

F. Maulita Barus and U. Sumatera Utara, “Mendeteksi outlier pada data multivariat dengan metode jarak mahalanobis-minimum covariance determinant (MMCD),” 2023. [Online]. Available: https://journal.csspublishing/index.php/ijm

T. Akhtar, “Clustering using HDBSCAN,” https://tariqueakhtar-39220.medium.com/hdbscan-and-its-pexample-180fdb364d85.

G. Stewart and M. Al-Khassaweneh, “An implementation of the HDBSCAN clustering algorithm,” Applied Sciences, vol. 12, no. 5, p. 2405, Feb. 2022, doi: 10.3390/app12052405.

B. Wira, A. E. Budianto, and A. S. Wiguna, “Implementasi metode K-Medoids clustering untuk mengetahui pola pemilihan program studi mahasiwa baru tahun 2018 di universitas Kanjuruhan Malang,” RAINSTEK : Jurnal Terapan Sains & Teknologi, vol. 1, no. 3, pp. 53–68, Sep. 2019, doi: 10.21067/jtst.v1i3.3046.

R. Handoyo, R. Mangkudjaja, and S. M. Nasution, “Perbandingan metode clustering menggunakan metode Single Linkage dan K - Means pada pengelompokan dokumen,” Jurnal SIFO Mikroskil, vol. 15, no. 2, pp. 73–82, Oct. 2014, doi: 10.55601/jsm.v15i2.161.

A. Shoolihah, M. Tanzil Furqon, and A. Wahyu Widodo, “Implementasi metode improved K-Means untuk mengelompokkan titik panas bumi,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 1, no. 11, pp. 1270–1276, 2017, [Online]. Available: http://j-ptiik.ub.ac.id

P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J Comput Appl Math, vol. 20, pp. 53–65, Nov. 1987, doi: 10.1016/0377-0427(87)90125-7.

S. Nisrina, W. P. Nurmayanti, Basirun, Kertanah, and Muhammad Gazali, “Penerapan metode clustering SOM dan DBSCAN dalam mengelompokkan unmet need keluarga berencana di Nusa Tenggara Barat,” J Statistika: Jurnal Ilmiah Teori dan Aplikasi Statistika, vol. 15, no. 2, pp. 237–244, Dec. 2022, doi: 10.36456/jstat.vol15.no2.a5549.

H. Dorojatun, “Seri artikel DDDM KPKNL Mamuju: normalisasi dan standardisasi dalam data mining,” https://www.djkn.kemenkeu.go.id/artikel/baca/15943/Seri-Artikel-DDDM-KPKNL-Mamuju-Normalisasi-dan-Standardisasi-dalam-Data-Mining.html.




DOI: http://dx.doi.org/10.17977/um018v7i22024p187-199

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Knowledge Engineering and Data Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Flag Counter

Creative Commons License


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View My Stats