A Proficient Approach for Facsimile Detection

M.Sreelekha*, K. Bhaskar Naik**
* PG Scholar, Department of Computer Science and Engineering, SreeVidyanikethan Engineering College, JNTU Ananthapur, Tirupati, India.
** Assistant Professor, Department of Computer Science and Engineering, SreeVidyanikethan Engineering College, JNTU Ananthapur, Tirupati, India.
Periodicity:September - November'2016
DOI : https://doi.org/10.26634/jcom.4.3.8285

Abstract

Now-a-days accuracy of databases is much more important, as it is primary and crucial for maintaining a database in current IT-based economy and also several organizations rely on the databases for carrying out their day-to-day operations. Consequently, much study on duplicate detection can also be named as entity resolution or facsimile recognition and by various names that focuses mainly on the pair selections increasing both the efficiency and recall. The process of recognizing multiple representations of the things with or in the same real world is named as Duplicate Detection. Among the indexing algorithms, Progressive duplicates detection algorithms is a novel approach whereby using the defined sorting key, sorts the given dataset, and compares the records within a window. So in order to get even faster results, than the traditional approaches, a new algorithm has been proposed combining the progressive approaches with the scalable approaches to progressively find the duplicates in parallel. This algorithm also proves that without losing the effectiveness during limited time of execution, maximizes the efficiency for finding duplicates.

Keywords

Pay-as-you-go, Duplicate Detection, Data Cleansing, Progressiveness, Parallelism.

How to Cite this Article?

Sreelekha, M., and Naik, K.B. (2016). A Proficient Approach for Facsimile Detection. i-manager’s Journal on Computer Science, 4(3), 11-18. https://doi.org/10.26634/jcom.4.3.8285

References

[1]. M. Saipriyanka, and M. Mayilvaganan, (2015). “Efficient and Effective Duplicate Detection Evaluating Multiple Data using Genetic Algorithm”. IJIRCCE, Vol. 3, No. 9, pp.1-2.
[2]. M-Y. Kan, D. Lee, and S. Yan, (2007). “Adaptive sorted neighborhood methods for efficient record linkage”. ACM/IEEE Joint Conference on Digital Libraries Proceedings, pp.1-3.
[3]. S. Sarawagi, (2000). “Special Issue on Data Cleaning”. IEEE Data Engineering Bulletin, Vol. 23, No. 4, pp.2-3.
[4]. J. Widom, (1995). Research Problems in Data Warehousing, ACM Conference, pp.25-30.
[5]. M. Herschel, and F. Naumann, (2010). An Introduction to Duplicate Detection, Morgan & Claypool Publishers, pp. 64-73.
[6]. Mie Miesu Thwin, and Aye Chan Mon, (2013). “Effective Blocking for combining Multiple Entity Resolution Systems”. International Journal of Computer Science Engineering, Vol. 2, No. 4, pp.126-128.
[7]. H. Kopche, L. Kolb, A. Grob, and T. Kirsten, (2010). “Data Partitioning for Distributed Entity Matching”. In Proceedings of the VLDB Endowment, pp.1-3.
[8]. H. Shang, W. Wang, X. Lin, and C. Xiao, (2009). “Top-k set similarity joins”. International Conference on Management of Data Proceedings, pp.916-920.
[9]. H. Garcia-Monila, D. Marmaros, and S.E. Whang, (2012). “Pay as you go entity resolution”. IEEE Transactions on Knowledge and Data Engineering, pp.2-4.
[10]. Felix Naumann, Arvid Heise, and Thorsten Papenbrock, (2015). “Progressive Duplicate Detection”. IEEE Transactions on Knowledge and Data Engineering, pp.1319-132.
[11]. E. Rahm, A. Thor, and L. Kolb, (2011). “Parallel Sorted neighborhood blocking with MapReduce”. Conference Datenbanksysteme in BTW, pp.4-9.
[12]. Wolfram Wingerath, and Steffen Froedrich, (2010). “Search-space reduction techniques for duplicate detection in probalistic data”. (Bachelor Thesis) Informatics Library of University of Hamburg, pp.32-39.
[13]. Chris Dyer, and Jimmy Lin, (2010). Data-Intensive Text Processing with Map Reduce, Morgan & Claypool Publishers, pp.22-26.
[14]. Karl Goiser, and Peter Christen, (2007). Quality and Complexity Measures for Data Linkage and Deduplication, Springer Publications, pp.13-17.
[15]. Jalaja. G and Veena. G, (2015). “Levenshtein Distance based Information Retrieval”. IJSER, pp.113-115.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.