Abstract

Now-a-days accuracy of databases is much more important, as it is primary and crucial for maintaining a database in current IT-based economy and also several organizations rely on the databases for carrying out their day-to-day operations. Consequently, much study on duplicate detection can also be named as entity resolution or facsimile recognition and by various names that focuses mainly on the pair selections increasing both the efficiency and recall. The process of recognizing multiple representations of the things with or in the same real world is named as Duplicate Detection. Among the indexing algorithms, Progressive duplicates detection algorithms is a novel approach whereby using the defined sorting key, sorts the given dataset, and compares the records within a window. So in order to get even faster results, than the traditional approaches, a new algorithm has been proposed combining the progressive approaches with the scalable approaches to progressively find the duplicates in parallel. This algorithm also proves that without losing the effectiveness during limited time of execution, maximizes the efficiency for finding duplicates.

Keywords
Pay-as-you-go, Duplicate Detection, Data Cleansing, Progressiveness, Parallelism.

Purchase Instant Access

PDF
10
USD

250
INR

HTML
10
USD

250
INR

Username / Email
Password
Don't have an account?  Sign Up
  • If you would like institutional access to this content, please recommend the title to your librarian.
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.

We strive to bring you the best. Your feedback is of great value to us. Feel free to post your comments and suggestions.