Notice of Retraction Data Mining Itemset of Big Data Using Pre-Processing Based on Mapreduce FrameWork with ETL Tools

Padmanathan Anantharaman, H.V. Ramakrishan

Abstract


Notice of Retraction

-----------------------------------------------------------------------
After careful and considered review of the content of this paper by a duly constituted expert committee, this paper has been found to be in violation of APTIKOM's Publication Principles.

We hereby retract the content of this paper. Reasonable effort should be made to remove all past references to this paper.

The presenting author of this paper has the option to appeal this decision by contacting ij.aptikom@gmail.com.

-----------------------------------------------------------------------

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.


References


O'Reilly Media Inc(2012),"Big Data Now",O'Reilly Media,Second Edition,2012.

Yu-Chiang Li(2009)."Algorithms for Frequent Itemset Mining and Database Sanitization: Data Mining",VDM Verlag,2009

Ian H. Witten and Eibe Frank, " Data Mining: Practical Machine Learning Tools and Techniques”, The Morgan Kaufmann, Series in Data Management Systems, Third Edition,2011.

Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. ”The KDD process for extracting useful knowledge from volumes of data. Commun”. ACM 39, 11,1996, pp. 27-34. DOI=10.1145/240455.240464,

Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large Databases”, SIGMOD Rec. 22, 2,1993,pp. 207-216. DOI=10.1145/170036.170072.

M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li.”Parallel algorithms for discovery of association rules. Data Min. and Knowl. Disc”, pp. 343–373, 1997.

G. A. Andrews. Foundations of Multithreaded, Parallel, and Distributed Programming”, Addison-Wesley, 2000.

J. Li, Y. Liu, W. k. Liao, and A. Choudhary.”Parallel data mining algorithms for association rules and clustering”,In Intl. Conf. on Management of Data, 2008.

E. Ozkural, B. Ucar, and C. Aykanat. Parallel frequent item set mining with selective item replication”,IEEE Trans. Parallel Distrib. Syst”,pp. 1632–1640, 2011.

M. J. Zaki,”Parallel and distributed association mining: A survey”, IEEE Concurrency, pp 14–25, 1999.

L. Zeng, L. Li, L. Duan, K. Lu, Z. Shi, M. Wang, W. Wu, and P. Luo,”Distributed data mining: a survey. Information Technology and Management, pp 403–409, 2012.

J. Han, J. Pei, and Y. Yin.,” Mining frequent patterns without candidate generation”, SIGMOD Rec., pp. 1–12, 2000.

L. Liu, E. Li, Y. Zhang, and Z. Tang,”Optimization of frequent itemset mining on multiple-core processor. In Proceedings of the 33rd international conference on Very large data bases, VLDB ’07, VLDB Endowment, pp. 1275–1285,2007.

M.-Y. Lin, P.-Y. Lee and S.C. Hsueh. Apriori-based frequent itemset mining algorithms on MapReduce. In Proc. ICUIMC, ACM, pp. 26–30., 2012.

N. Li, L. Zeng, Q. He, and Z. Shi. Parallel implementation of Apriori algorithm based on MapReduce. In Proc. SNPD, pp. 236–241, 2012.

S. Hammoud. MapReduce Network Enabled Algorithms for Classification Based on Association Rules”,Thesis, 2011.

L. Zhou, Z. Zhong, J. Chang, J. Li, J. Huang, and S. Feng. Balanced parallel FP-Growth with MapReduce. In Proc. YC-ICT, pp. 243–246, 2010.

Sheng-Hui Liu; Shi-Jia Liu; Shi-Xuan Chen; Kun-Ming Yu, "IOMRA - A High Efficiency Frequent Itemset Mining Algorithm Based on the MapReduce Computation Model," Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on, vol., no., pp.1290,1295, 19-21 Dec. 2014.doi: 10.1109/CSE.2014.247

Moens, S.; Aksehirli, E.; Goethals, B., "Frequent Itemset Mining for Big Data," Big Data, 2013 IEEE International Conference on, vol., no., pp.111,118, 6-9 Oct. 2013 doi: 10.1109/BigData.2013.6691742

M. Riondato, J. A. DeBrabant, R. Fonseca, and E. Upfal. PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce.In Proc. CIKM, ACM, pp. 85–94, 2012.

M. Malek and H. Kadima. Searching frequent itemsets by clustering data: towards a parallel approach using mapreduce. In Proc. WISE 2011 and 2012 Workshops”,Springer Berlin Heidelberg,pp 251–258,2013.

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc”,VLDB, pp. 487–499, 1994.

M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithms for discovery of association rules. Data Min. and Knowl. Disc”, pp. 343–373, 1997.

A K Jain, M N Murty, P. J. Flynn, “Data Clustering: A Review”, ACM COMPUTING SURVEYS, 1999

Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data, 2004.

T. De Bie. An information theoretic framework for data mining. In Proc. ACM SIGKDD, pp. 564–572, 2011.




DOI: https://doi.org/10.11591/APTIKOM.J.CSIT.103

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 APTIKOM Journal on Computer Science and Information Technologies



ISSN: 2722-323X, e-ISSN: 2722-3221

CSIT Stats

 

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.