IMPROVEMENT OF DATA ANALYSIS BASED ON K-MEANS ALGORITHM AND AKMCA

Zeeshan Ali Khan; Manjari Singh; Rajesh Boghey

Zeeshan Ali Khan Technocrats Institute of Technology Excellence, MP, India
Manjari Singh Technocrats Institute of Technology Excellence, MP, India
Rajesh Boghey Technocrats Institute of Technology Excellence, MP, India

Keywords: Data Mining, Supervised Learning, Unsupervised Learning, K-means Clustering, Smart Data Analysis, VSM, Error Rate

Abstract

Data analysis is improved using the k-means algorithm and AKMCA. Data mining aims to extract information from a large data set and transform it into a functional structure. Exploratory data analysis and data mining applications rely heavily on clustering. Clustering is grouping a set of objects so that those in the same group (called a cluster) are more similar to those in other groups (clusters). There are various types of cluster models, such as connectivity models, distribution models, centroid models, and density models. Clustering is a technique in data mining in which the set of objects is classified as clusters. Clustering is the most important aspect of data mining. The algorithm makes use of the density number concept. The high-density number point set is extracted from the original data set as a new training set, and the point in the high-density number point set is chosen as the initial cluster centre point. The basic clustering technique and the most widely used algorithm is K-means clustering.

K-Means, a partition-based clustering algorithm, is widely used in many fields due to its efficiency and simplicity. However, it is well known that the K-Means algorithm can produce suboptimal results depending on the initial cluster centre chosen. It is also referred to as Looking for the nearest neighbours. It simply divides the datasets into a specified number of clusters. Numerous efforts have been made to improve the K-means clustering algorithm’s performance. Advanced k-mean clustering algorithm (AKMCA) is used in data analysis to obtain useful knowledge of various optimisation and classification problems that can be used for processing massive amounts of raw and unstructured data. Knowledge discovery provides the tools needed to automate the entire data analysis and error reduction process, where their efficacy is investigated using experimental analysis of various datasets. The detailed experimental analysis and a comparison of proposed work with existing k-means clustering algorithms. Furthermore, it provides a clear and comprehensive understanding of the k-means algorithm and its various research directions.

Downloads

Download data is not yet available.

References

[1]. Bandyopadhyay, Seema, and Edward J. Coyle. “An energy-efficient hierarchical clustering algorithm for wireless sensor networks.” In IEEE Infocom 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 3, pp. 1713-1723. IEEE, 2003.
[2]. A. Kumar, R. Sinha, V. Bhattacharjee, D. S. Verma, S. Singh, “modelling using K-means clustering algorithm”, IEEE 2012, 1st international conference on recent advances in information technology (RAIT).
[3]. Bruno Fernandez Chimieski, Rubem Dutra RibeiroFagundes, “Association and Classification Data Mining Algorithms Comparison over Medical Datasets”, J. Health Inform. Abril-Junho; 5(2): 44-5, 2013.
[4]. D. Arthur, S. Vassilvitskii, “k-means++: The advantages of careful seeding”, Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms, pp. 1027–1035, 2007.
[5]. Ester, Martin, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise.” In KDD, vol. 96, no. 34, pp. 226-231. 1996.
[6]. Joachims, Thorsten. “Text categorisation with support vector machines: Learning with many relevant features.” In European conference on machine learning, pp. 137-142. Springer, Berlin, Heidelberg, 1998.
[7]. Junatao Wang, XiaolongSu, “An Improved K-means Clustering Algorithm, Communication Software and Networks (ICCSN), IEEE 3rd International Conference on 27 May, (pp. 44-46), 2011.
[8]. Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schroedl, “Constrained K-means Clustering with Background Knowledge”, ICML Proceedings of the Eighteenth International Conference on Machine Learning, pp. 577–584, 2001.
[9]. K. A. Abdul Nazeer, M. P. Sebastian, “Improving the Accuracy and Efficiency of the k-means Clustering Algorithm, Proceedings of the World Congress on Engineering 2009 Vol I WCE 2009, July 1 - 3, London, U.K., 2009.
[10]. Kanungo, Tapas, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. “An efficient k-means clustering algorithm: Analysis and implementation.” IEEE Transactions on Pattern Analysis & Machine Intelligence 7: 881-892, 2002.
[11]. Zhang, Lanlan, Jinshuai Qu, Minghu Gao, and Meina Zhao. “Improvement of K-means algorithm based on density.” In 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 1070-1073. IEEE, 2019.
[12]. Du, Xin, Ning Xu, Cailan Zhou, and Shihui Xiao. “A density-based method for selecting the initial clustering centres of K-means algorithm.” In 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp. 2509-2512. IEEE, 2017.
[13]. Ioniţa, Irina, and Liviu Ioniţa. “Applying data mining techniques in healthcare.” Stud Inform Control 25, no. 3, 385-94, 2016.
[14]. Joseph, S. Iwin Thanakumar, and Iwin Thanakumar. “Survey of data mining algorithms for the intelligent computing system.” Journal of trends in Computer Science and Smart technology (TCSST) 1, no. 01 (2019): 14-24.
[15]. Parvathi, I., and Siddharth Rautaray. “Survey on data mining techniques for diagnosing diseases in the medical domain.” International Journal of Computer Science and Information Technologies 5, no. 1 (2014): 838-846.
[16]. Moertini V, Venica L . Enhancing Parallel k-Means Using Map Reduce for Discovering Knowledge from Big Data. IEEE Int Conf Cloud Computing Big Data Anal 81–87, 2016.
[17]. Zhao W, Ma H, He Q. Parallel K -Means Clustering Based on MapReduce. CloudCom 674–679, 2009.
[18]. Du, Wei, Hu Lin, Jianwei Sun, Bo Yu, and Haibo Yang. “A new projection-based K-Means initialisation algorithm.” In 2016 IEEE Chinese Guidance, Navigation and Control Conference (CGNCC), pp. 2341-2345. IEEE, 2016.
[19]. Shi, Haobin, and Meng Xu. “A data classification method using genetic algorithm and K-means algorithm with optimising initial cluster centre.” In 2018 IEEE International Conference on Computer and Communication Engineering Technology (CCET), pp. 224-228. IEEE, 2018.
[20]. Singh, Dileep Kumar, and Vishnu Swaroop. “Data security and privacy in data mining: research issues & preparation.” International Journal of Computer Trends and Technology 4, no. 2: 194-200, 2013.
[21]. Marty, Babu, G.P. and M.N., “Clustering with evolution strategies Pattern Recognition”, 27, 2, 321-329, 1994.
[22]. Mitchell, Tom M. “Machine learning and data mining.” Communications of the ACM 42.11, 1999.
[23]. Md Sohrab Mahmud, Md. Mostafizer Rahman, Md. Nasim Akhtar, “Improvement of K-means Clustering algorithm with better initial centroids based on weighted average”, IEEE 7th International Conference on Electrical and Computer Engineering, pp. 647-650, 2012.