Hierarchical clustering algorithm-dendogram using Euclidean and Manhattan distance

Mukhtar Mukhtar, Majid Khan Majahar Ali, Faula Arina, Agung Satrio Wicaksono, Aulia Ikhsan, Weksi Budiaji, Syarif Abdullah, Dinda Dwi Anugrah Pertiwi, Robby Zidny, Yuvita Oktarisa, Royan Habibie Sukarna

Abstract


This paper presents the outcomes of a research experiment on the drying process of seaweed. There are numerous approaches to clustering data, such as partitioning and the Hierarchical Clustering Algorithm (HCA). The HCA has been implemented in binary tree structures to visualize data clustering. We conducted a comparative analysis of the four primary methodologies utilized in HCA, namely: 1) single linkage, 2) complete linkage, 3) average linkage, and 4) Ward's linkage. Clustering validation is widely recognized as a crucial issue that significantly impacts the effectiveness of clustering algorithms. Clustering validation can be identified, such as internal and external validation. Internal clustering validation, in particular, holds significant importance in the realm of data science. With this article, the main goal is to do an empirical evaluation of the traits that a representative set of internal clustering validation indices, namely Connectivity, Dunn, and Silhouette, show. In this paper, the HCA applies two distance functions between Euclidean and Manhattan distances to analyze the entanglement function and internal validity.

Keywords


Hierarchical Clustering Algorithm (HCA); Dendogram; Euclidean and Manhattan distance.

Full Text:

PDF

References


Abbas, K. A. et al. (2023). Unsupervised machine learning technique for classifying production zones in unconventional reservoirs. Int. J. Intell. Networks, vol. 4, pp. 29–37. Doi: 10.1016/j.ijin.2022.11.007.

Huang, J., Yu, Z. L., & Gu, Z. (2018). A clustering method based on extreme learning machine. Neurocomputing, vol. 277, pp. 108–119, Feb. 2018. Doi: 10.1016/j.neucom.2017.02.100.

Chhabra, A., Masalkovaite, K., & Mohapatra, P. (2021). An Overview of Fairness in Clustering. IEEE Access, vol. 9, pp. 130698–130720. Doi: 10.1109/ACCESS.2021.3114099.

Thilagavathi, G., Srivaishnavi, D., & Aparna, N. (2013). A Survey on Efficient Hierarchical Algorithm used in Clustering. Int. J. Eng. Res. Technol., vol. 2, no. 9, pp. 2553–2556.

Saket, S., & Pandya, S. (2016). Implementation of Extended K-Medoids Algorithm to Increase Efficiency and Scalability using Large Datasets. Int. J. Comput. Appl., vol. 146, no. 5, pp. 19–23, Jul. 2016. Doi: 10.5120/ijca2016910701.

Krishnamurthy, L. et al. (2011). Large genetic variation for heat tolerance in the reference collection of chickpea ( Cicer arietinum L.) germplasm. Plant Genet. Resour., vol. 9, no. 01, pp. 59–69, Apr. 2011. Doi: 10.1017/S1479262110000407.

Murtagh, F., & Contreras, P. (2017). Algorithms for hierarchical clustering: an overview II. WIREs Data Min. Knowl. Discov., vol. 7, no. 6, Nov. 2017. Doi: 10.1002/widm.1219.

Zhang, Z., Murtagh, F., Van Poucke, S., Lin, S., & Lan, P. (2017). Hierarchical cluster analysis in clinical research with heterogeneous study population: highlighting its visualization with R. Ann. Transl. Med., vol. 5, no. 4, pp. 75–75, Feb. 2017. Doi: 10.21037/atm.2017.02.05.

Rani, Y., & Rohil, H. (2013). A Study of Hierarchical Clustering Algorithm. International Journal of Information and Computation Technology, vol. 3, no. 11, pp. 1225–1232.

Sembiring, R. W., Zain, J. M., & Embong, A. (2011). A Comparative Agglomerative Hierarchical Clustering Method to Cluster Implemented Course. No. January, 2011, [Online]. Available: http://arxiv.org/abs/1101.4270

Miller, H. J. (2007). Geographic Data Mining and Knowledge Discovery, in The Handbook of Geographic Information Science, Wiley, 2007, pp. 352–366. doi: 10.1002/9780470690819.ch19.

Camiz, S. & Pillar, V. (2007). Comparison of single and complete linkage clustering with the hierarchical factor classification of variables. Community Ecol., vol. 8, no. 1, pp. 25–30, Jun. 2007. Doi: 10.1556/ComEc.8.2007.1.4.

Gere, A. (2023). Current Research in Food Science Recommendations for validating hierarchical clustering in consumer sensory projects. Curr. Res. Food Sci., vol. 6, no. May, p. 100522, 2023. Doi: 10.1016/j.crfs.2023.100522.

Murtagh, F. (2014). Ward ’ s Hierarchical Agglomerative Clustering Method : Which Algorithms Implement Ward ’ s Criterion ?. J. Classif., vol. 295, no. October, pp. 274–295. Doi: 10.1007/s00357-.

Xu, N., Finkelman, R. B., Dai, S., Xu, C., & Peng, M. (2021). Average Linkage Hierarchical Clustering Algorithm for Determining the Relationships between Elements in Coal. ACS Omega, vol. 6, no. 9, pp. 6206–6217, Mar. 2021. Doi: 10.1021/acsomega.0c05758.

Han & Kamber. (2011). Data Mining: Concepts and Techniques, 3rd Editio. Burlington: Morgan Kaufmann.

Ponnmoli & Selvamuthukumaran. (2014). Analysis of face recognition using manhattan distance algorithm with image segmentation. Intl. J. Comput. Sci. Mob. Comput, vol. 3, pp. 18–27.

Vergani, A. A. & Binaghi, E. (2018). A soft davies-bouldin separation measure. IEEE Int. Conf. Fuzzy Syst., vol. 2018-July, no. February, 2018. Doi: 10.1109/FUZZ-IEEE.2018.8491581.

Yahyaoui, H., & Own, H. S. (2018). Unsupervised clustering of service performance behaviors. Inf. Sci. (Ny)., vol. 422, pp. 558–571, Jan. 2018. Doi: 10.1016/j.ins.2017.08.065.

Pal, N. R., & Biswas, J. (1997). Cluster validation using graph theoretic concepts,” Pattern Recognit., vol. 30, no. 6, pp. 847–857. Doi: 10.1016/S0031-3203(96)00127-6.

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., vol. 20, no. C, pp. 53–65. Doi: 10.1016/0377-0427(87)90125-7.




DOI: http://dx.doi.org/10.62870/tjst.v20i1.23187

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Teknika: Jurnal Sains dan Teknologi

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Creative Commons License

Teknika: Jurnal Sains dan Teknologi is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.