Two- Way Adaptive Clustering Algorithm (TWACA) for Clustering Large Weather Datasets

Vijayameena*, K.KODEESWARI**
*_** M.Tech Scholar, Department of Information Technology, Dr. Sivanthi Aditanar College of Engineering, Thiruchendur, Tamilnadu
Periodicity:June - August'2014
DOI : https://doi.org/10.26634/jcom.2.2.3232

Abstract

Clustering is the unsupervised classification of data items into homogeneous groups called clusters. Data linkage is the task of identifying different data items that refer to the same entity across different data sources. De-duplicating one data set or linking several data sets are important tasks in the data preparation steps of many data mining process. Data linkage is traditionally performed among tables to cluster the data. Traditional method has taken long time for clustering the data from data sets. In this new proposed technique, allowing many such operators are to be active in parallel. TWAC is optimized to produce initial results quickly and can hide irregular delays in data arrival by reactively scheduling background processing. The main aim of this paper is optimizing a query clustering operation and execution. Depending upon given query, entail column only linked and clustered from weather data sets. Finally, particular data's will be clustered and exhibited with clustering timing. Therefore, compared to traditional method, TWA clustering method is rapidly to cluster the data from data set. Eventually, comparisons of execution details are stored in a log file (i.e., txt file). This log file is used to viewing the data in SAP-Crystal report. So, TWACA is an effective solution for providing fast query responses to users, even in the presence of slow and bursty remote sources.

Keywords

Data Linkage, Clustering, Data Set, Two Way Adaptive Clustering.

How to Cite this Article?

Vijayameena, P., Kodeeswari, K. (2014). Two- Way Adaptive Clustering Algorithm (TWACA) for Clustering Large Weather Datasets. i-manager’s Journal on Computer Science, 2(2), 31-34. https://doi.org/10.26634/jcom.2.2.3232

References

[1]. P. Fellegi and A.B. Sunter, (1969). “A Theory for Record Linkage,” J. Am. Statistical Soc., Vol.64(328), pp. 1183- 1210.
[2]. M. Yakout, A.K. Elmagarmid, H. Elmeleegy, M. Quzzani, and A. Qi, (2010). “Behavior Based Record Linkage,” Proc. VLDB Endowment, Vol. 3(1&2), pp.439-448.
[3]. J. Struyf and S. Dzeroski, (2007). “Clustering Trees with Instance Level Constraints,” Proc.18th European Conf. Machine Learning, pp.359-370.
[4]. O. Benjelloun, H. Garcia, D. Menestrina, Q. Su, S. Whang, and J. Widom, (2009). “Swoosh: A Generic Approach to Entity Resolution,” The VLDB J., Vol.18(1), pp.255-276.
[5]. S.E. Whang and H. Gercia-Molina, (2009). “Joint Entity Resolution,” Technical report, Stanford Univ.
[6]. Ma'ayan Dror, Asaf Shabtai, Lior Rokach, and Yuval Elovi A, (2013). “One-Class Clustering Tree for Implementing One-to-Many Data Linkage,” Vol.26, pp.682-697.
[7]. M.A. Bornea, V. Vassalos, Y. Kotidis, andA. Deligiannakis, (2009). “Double Index Nested-Loop Reactive Join for Result Rate Optimization,” Proc. IEEE Int'l Conf. Data Eng.(ICDE).
[8]. A.J. Storkey, C.K.I. Williams, E. Taylor, and R.G. Mann, (2005). “An Expectation Maximisation Algorithm for Oneto- Many Record Linkage,” Univ. of Edinburgh Informatics Research Report.
[9]. J. Domingo-Ferrer and V. Torra, (2003). “Disclosure Risk Assessment in Statistical Microdata Protection via Advanced Record Linkage,” Statistics and Computing, Vol.13(4), pp.343-354.
[10]. F. De Comite´, F. Denis, R. Gilleron, and F. Letouzey, (1999). “Positive andUnlabeled Examples Help Learning,” Proc. 10th Int'l Conf. Algorithmic Learning Theory, pp.219- 230.
[11]. M.D. Larsen and D.B. Rubin, (2001). “Iterative Automated Record Linkage Using Mixture Models,” J. Am. Statistical Assoc., Vol. 96(453), pp.32-41.
[12]. S. Ivie, G. Henry, H. Gatrell, and C. Giraud-Carrier, (2007). “A Metric-Based Machine Learning Approach to Genealogical Record Linkage,” Proc. Seventh Ann. Workshop Technology for Family Histor y and Genealogical Research.
[13]. P. Christen and K. Goiser, (2007). “Quality and Complexity Measures for Data Linkage and Deduplication,” Quality Measures in Data Mining, Vol.43, pp.127-151.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.