Parallelization of Partitioning Around Medoids (PAM) in K-Medoids Clustering on GPU

Adhi Prahara, Dewi Pramudi Ismi, Ahmad Azhari

Abstract


K-medoids clustering is categorized as partitional clustering. K-medoids offers better result when dealing with outliers and arbitrary distance metric also in the situation when the mean or median does not exist within data. However, k-medoids suffers a high computational complexity. Partitioning Around Medoids (PAM) has been developed to improve k-medoids clustering, consists of build and swap steps and uses the entire dataset to find the best potential medoids. Thus, PAM produces better medoids than other algorithms. This research proposes the parallelization of PAM in k-medoids clustering on GPU to reduce computational time at the swap step of PAM. The parallelization scheme utilizes shared memory, reduction algorithm, and optimization of the thread block configuration to maximize the occupancy. Based on the experiment result, the proposed parallelized PAM k-medoids is faster than CPU and Matlab implementation and efficient for large dataset.


Full Text:

PDF

References


Y. Zou and B. Liu, “Survey on clustering-based image segmentation techniques,” in Proceedings of the 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2016, Sep. 2016, pp. 106–110, doi: 10.1109/CSCWD.2016.7565972.

N. Dhanachandra and Y. J. Chanu, “A Survey on Image Segmentation Methods using Clustering Techniques,” Eur. J. Eng. Res. Sci., vol. 2, no. 1, p. 15, Jan. 2017, doi: 10.24018/ejers.2017.2.1.237.

A. Saxena et al., “A review of clustering techniques and developments,” Neurocomputing, vol. 267, pp. 664–681, Dec. 2017, doi: 10.1016/j.neucom.2017.06.053.

A. Prahara, I. T. R. Yanto, and T. Herawan, “Histogram thresholding for automatic color segmentation based on k-means clustering,” in Advances in Intelligent Systems and Computing, 2017, vol. 549 AISC, pp. 344–354, doi: 10.1007/978-3-319-51281-5_35.

S. Wazarkar and B. N. Keshavamurthy, “A survey on image data analysis through clustering techniques for real world applications,” J. Vis. Commun. Image Represent., vol. 55, pp. 596–626, Aug. 2018, doi: 10.1016/j.jvcir.2018.07.009.

D. K. Tasoulis, V. P. Plagianakos, and M. N. Vrahatis, “Unsupervised clustering of bioinformatics data,” in European Symposium on Intelligent Technologies, Hybrid Systems and their implementation on Smart Adaptive Systems, Eunite, 2004, pp. 47–53.

J. D. MacCuish and N. E. MacCuish, Clustering in bioinformatics and drug discovery. CRC Press, 2010.

P. Berkhin, “A survey of clustering data mining techniques,” in Grouping Multidimensional Data: Recent Advances in Clustering, Springer Berlin Heidelberg, 2006, pp. 25–71.

C. K. Reddy and B. Vinzamuri, “A Survey of Partitional and Hierarchical Clustering Algorithms.,” Data Clust. Algorithms Appl., vol. 87, 2013.

P. Arora, Deepali, and S. Varshney, “Analysis of K-Means and K-Medoids Algorithm for Big Data,” in Physics Procedia, Jan. 2016, vol. 78, pp. 507–512, doi: 10.1016/j.procs.2016.02.095.

E. Schubert and P. J. Rousseeuw, “Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Oct. 2019, vol. 11807 LNCS, pp. 171–187, doi: 10.1007/978-3-030-32047-8_16.

P. O. Olukanmi, F. Nelwamondo, and T. Marwala, “PAM-lite: Fast and accurate k-medoids clustering for massive datasets,” in Proceedings - 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa, SAUPEC/RobMech/PRASA 2019, May 2019, pp. 200–204, doi: 10.1109/RoboMech.2019.8704767.

H. Song, J.-G. Lee, and W.-S. Han, “PAMAE: Parallel k-Medoids Clustering with High Accuracy and Efficiency,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 1087–1096, doi: 10.1145/3097983.3098098.

Ying-ting Zhu, Fu-zhang Wang, Xing-hua Shan, and Xiao-yan Lv, “K-medoids clustering based on MapReduce and optimal search of medoids,” in 2014 9th International Conference on Computer Science & Education, Aug. 2014, pp. 573–577, doi: 10.1109/ICCSE.2014.6926527.

A. Martino, A. Rizzi, and F. M. Frattale Mascioli, “Efficient approaches for solving the large-scale k-medoids problem: Towards structured data,” in Studies in Computational Intelligence, Nov. 2019, vol. 829, pp. 199–219, doi: 10.1007/978-3-030-16469-0_11.

A. Prahara, D. P. Ismi, A. I. Kistijantoro, and M. L. Khodra, “Parallelized k-means clustering by exploiting instruction level parallelism at low occupancy,” in Proceedings - 2017 2nd International Conferences on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2017, Feb. 2018, vol. 2018-January, pp. 30–34, doi: 10.1109/ICITISEE.2017.8285516.

X. Wang, “A Survey of Clustering Algorithms Based on Parallel Mechanism,” Apr. 2018, pp. 119–122, doi:10.2991/cmsa-18.2018.28.

Y. Jiang and J. Zhang, “Parallel K-Medoids clustering algorithm based on Hadoop,” in 2014 IEEE 5th International Conference on Software Engineering and Service Science, Jun. 2014, pp. 649–652, doi: 10.1109/ICSESS.2014.6933652.

M. O. Shafiq and E. Torunski, “A Parallel K-Medoids Algorithm for Clustering based on MapReduce,” in 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Dec. 2016, pp. 502–507, doi: 10.1109/ICMLA.2016.0089.

X. Yue, W. Man, J. Yue, and G. Liu, “Parallel K-Medoids++ Spatial Clustering Algorithm Based on MapReduce,” pp. 1–8, Aug. 2016, Accessed: Jun. 21, 2020. [Online]. Available: http://arxiv.org/abs/1608.06861.

D. Rajendran, S. Jangiti, S. Muralidharan, and M. Thendral, “Incremental MapReduce for K-Medoids Clustering of Big Time-Series Data,” in Proceedings of the 2nd International Conference on Trends in Electronics and Informatics, ICOEI 2018, Nov. 2018, pp. 1143–1146, doi: 10.1109/ICOEI.2018.8553756.

Y. Zhao, B. Chen, and M. Li, “Parallel K-Medoids Improved Algorithm Based on MapReduce,” in Proceedings - 2018 6th International Conference on Advanced Cloud and Big Data, CBD 2018, Nov. 2018, pp. 18–23, doi: 10.1109/CBD.2018.00013.

R. Wu, B. Zhang, and M. Hsu, “Clustering billions of data points using GPUs,” in Proc. Combined Workshops on UnConventional High Performance Computing Workshop Plus Memory Access Workshop, UCHPC-MAW ’09, Co-located with the 2009 ACM Int. Conf. on Computing Frontiers, CF’09, 2009, pp. 1–5, doi: 10.1145/1531666.1531668.

E. Zhou, S. Mao, M. Li, and Z. Sun, “PAM spatial clustering algorithm research based on CUDA,” in International Conference on Geoinformatics, Sep. 2016, vol. 2016-September, doi: 10.1109/GEOINFORMATICS.2016.7578971.

Y. Li, K. Zhao, X. Chu, and J. Liu, “Speeding up k-Means algorithm by GPUs,” J. Comput. Syst. Sci., vol. 79, no. 2, pp. 216–229, Mar. 2013, doi: 10.1016/j.jcss.2012.05.004.

K. R. Kurte and S. S. Durbha, “High resolution disaster data clustering using Graphics Processing Units,” in 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS, Jul. 2013, pp. 1696–1699, doi: 10.1109/IGARSS.2013.6723121.

K. J. Kohlhoff, M. H. Sosnick, W. T. Hsu, V. S. Pande, and R. B. Altman, “CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms,” Bioinformatics, vol. 27, no. 16, pp. 2321–2322, Aug. 2011, doi: 10.1093/bioinformatics/btr386.

L. Kaufman and P. J. Rousseeuw, Clustering by Means of Medoids. Faculty of Mathematics and Informatics, 1987.

R. T. Ng and Jiawei Han, “CLARANS: a method for clustering objects for spatial data mining,” IEEE Trans. Knowl. Data Eng., vol. 14, no. 5, pp. 1003–1016, Sep. 2002, doi: 10.1109/TKDE.2002.1033770.

J. Fung and S. Mann, “Using graphics devices in reverse: GPU-based Image Processing and Computer Vision,” in 2008 IEEE International Conference on Multimedia and Expo, Jun. 2008, pp. 9–12, doi: 10.1109/ICME.2008.4607358.

L. Kaufman, P. J. R. Leonard Kaufman, and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, 1990.

H.-S. Park and C.-H. Jun, “A simple and fast algorithm for K-medoids clustering,” Expert Syst. Appl., vol. 36, no. 2, pp. 3336–3341, Mar. 2009, doi: 10.1016/j.eswa.2008.01.039.




DOI: http://dx.doi.org/10.17977/um018v3i12020p40-49

Refbacks

  • There are currently no refbacks.


Copyright (c) 2020 Knowledge Engineering and Data Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Flag Counter

Creative Commons License


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View My Stats