Notice of Retraction High dimensional data clustering

M. Pavithra, R.M.S. Parvathi

Abstract


Notice of Retraction

-----------------------------------------------------------------------
After careful and considered review of the content of this paper by a duly constituted expert committee, this paper has been found to be in violation of APTIKOM's Publication Principles.

We hereby retract the content of this paper. Reasonable effort should be made to remove all past references to this paper.

The presenting author of this paper has the option to appeal this decision by contacting ij.aptikom@gmail.com.

-----------------------------------------------------------------------

Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. The proposed method called “kernel trick” and “Collective Neighbour Clustering”, which takes as input measures of correspondence between pairs of data points. Real-valued hubs are exchanged between data points until a high-quality set of patterns and corresponding clusters gradually emerges [2]. To validate our theory by demonstrating that hubness is a high-quality measure of point centrality within a high dimensional information cluster, and by proposing several hubness-based clustering algorithms, showing that main hubs can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster patterns [4]. Experimental results demonstrate the good performance of our proposed algorithms in manifold settings, mainly focused on large quantities of overlapping noise. The proposed methods are modified mostly for detecting approximately hyper spherical clusters and need to be extended to properly handle clusters of arbitrary shapes [6]. For this purpose, we provide an overview of approaches that use quality metrics in high-dimensional data visualization and propose systematization based on a thorough literature review. We carefully analyze the papers and derive a set of factors for discriminating the quality metrics, visualization techniques, and the process itself [10]. The process is described through a reworked version of the well-known information visualization pipeline. We demonstrate the usefulness of our model by applying it to several existing approaches that use quality metrics, and we provide reflections on implications of our model for future research. High-dimensional data arise naturally in many domains, and have regularly presented a great challenge for traditional data-mining techniques, both in terms of effectiveness and efficiency [7]. Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. In this paper we take a novel perspective on the problem of clustering high-dimensional data [8]. Instead of attempting to avoid the curse of dimensionality by observing a lower-dimensional feature subspace, we embrace dimensionality by taking advantage of some inherently high-dimensional phenomena. More specifically, we show that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest neighbour lists of other points, can be successfully exploited in clustering. We validate our hypothesis by proposing several hubness-based clustering algorithms and testing them on high-dimensional data. Experimental results demonstrate good performance of our algorithms in multiple settings, particularly in the presence of large quantities of noise [9].


References


C Aggarwal, J Han, J Wang, P Yu. A Framework for Projected Clustering of High Dimensional Data Streams. In VLDB Conference, 2014.

C Aggarwal, J Han, J Wang, P Yu. On High Dimensional Projected Clustering of Data Streams. Data Mining and Knowledge Discovery Journal. 2015; 10(3): 251-273.

L Yu, H. Liu. Efficiently Handling Feature Redundancy in High-Dimensional Data. Proc. Ninth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD ’03). 2013: 685-690.

Gnanabaskaran A, Duraiswamy K. An Efficient Approach to Cluster High Dimensional Spatial Data Using K, Mediods Algorithm. European Journal of Scientific Research. 2011; 49(4): 617,624.

Lance P, Ehtesham H, Huan L. Subspace Clustering for High Dimensional Data: A Review. ACM SIGKDD Explorations Newsletter. 2014; 6(1): 90,105.

Naveen K, Naveen G, Veera R. Partition Algorithms, A Study and Emergence of Mining Projected Clusters in High, Dimensional Dataset. International Journal of Computer Science and Telecommunications. 2011; 2(4): 34, 37.

Sembiring R, Zain J, Abdullah E. Clustering High Dimensional Data using Subspace and Projected Clustering Algorithms. International Journal of Computer Science & Information Technology. 2010; 2(4): 162, 170.

Singh V, Sahoo L, Kelkar A. Mining Subspace Clusters in High Dimensional Data. International Journal of Recent Trends in Engineering and Technology. 2012; 3(1): 118,112.

Mohamed B, Shergrui W. Mining Projected Clusters in High, Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering. 2009; 21(4): 507,522.

David L. Donoho. High Dimensional Data Analysis: The Curses and Blessings of Dimensionality. American Math. Society Conference: Mathematical Challenges of the 21st Century, Los Angeles, CA, August, 6-11, 2010.

Karthikeyan P, Saravanan P, Vanitha E. High Dimensional Data Clustering using FAST Cluster Based feature selection. Journal of engineering research and application. 2014; 4: 65-71.

Guangtao Wang, Qinbao Song, Baowen Xu, Yuming Zhou. Selecting feature subset for high dimensional data via the propositional FOIL rules. Elsevier 2012.

Hua-Liang Wei, Stephen A Billings. Feature subset selection and Ranking for Data Dimensionality Reduction. IEEE Transactions on pattern analysis and machine intelligence, Vol. 29, N0.1, 2010.




DOI: https://doi.org/10.11591/APTIKOM.J.CSIT.82

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 APTIKOM Journal on Computer Science and Information Technologies



ISSN: 2722-323X, e-ISSN: 2722-3221

CSIT Stats

 

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.