Plagiarism Detection with Optical Character Recognition

Srinidhi Jeganmohan*, Nivetha Nagendran **, Gayathri R.***, S. Gurusubramani ****
*-**** Department of Computer Science, Sri Sairam Engineering College, Chennai, Tamil Nadu, India.
Periodicity:March - May'2021


There is an increased tendency to copy another person's content blatantly from the information available in the Internet. The act of plagiarism is using someone else's idea and written work without their knowledge or acknowledgment; this is intellectual theft, and is a crime. To address this issue, it is necessary to maintain a plagiarism reporting system to keep track of text recycling. The objective of this paper, is to develop an application where texts are compared to detect plagiarism, even if the text is uploaded in image format. The text from images is extracted with the help of Optical Character Recognition (OCR). Similarity, analysis is calculated through machine learning techniques of word to vector conversion and cosine similarity. Dataset for comparison is taken from text scripts from the internet and manuscripts of journals or essays of students. The key concept behind this paper is to discourage academic plagiarism among the student community and to stimulate the practice of writing originally.


Plagiarism, Optical Character Recognition (OCR), Cosine Similarity, Word to Vector, Python.

How to Cite this Article?

Jeganmohan, S., Nagendran, N., Gayathri, R., and Gurusubramani, S. (2021). Plagiarism Detection with Optical Character Recognition. i-manager's Journal on Computer Science, 9(1), 15-20.


[1]. Baba, K. (2017, September). Fast plagiarism detection based on simple document similarity. In 2017, Twelfth International Conference on Digital Information Management (ICDIM), (pp. 54-58). IEEE. 10.1109/ICDIM.2017.8244662
[2]. Băutu, E., & Băutu, A. (2018, June). PlagZap: A textual plagiarism detection system for student assignments built with open-source software. In 13th International Conference on Soft Computing Models in Industrial and Environmental Applications, (pp. 500-508). Cham: Springer.
[3]. Foltýnek, T., Meuschke, N., & Gipp, B. (2019). Academic plagiarism detection: A systematic literature review. ACM Computing Surveys (CSUR), 52(6), 1-42.
[4]. Hatta, H. R., Rasyid, M., & Azhari, M. (2014). Detecting plagiarism journal with Sherlock algorithm. In Bali International Seminar on Science and Technology (BISSTECH), (pp. 1-6).
[5]. Meuschke, N., Stange, V., Schubotz, M., Kramer, M., & Gipp, B. (2019, June). Improving academic plagiarism detection for STEM documents by analyzing mathematical content and citations. In 2019, ACM/IEEE Joint Conference on Digital Libraries (JCDL), (pp. 120-129). IEEE. https://doi. org/10.1109/JCDL.2019.00026
[6]. Parwita, W. G. S., Indradewi, I. G. A. A. D., & Wijaya, I. N. S. W. (2019, October). String matching based plagiarism detection for document in Bahasa, Indonesia. In 2019, 5th International Conference on New Media Studies (CONMEDIA), (pp. 54-58). IEEE. CONMEDIA46929.2019.8981821
[7]. White, D. R., & Joy, M. S. (2004). Sentence-based natural language plagiarism detection. Journal on Educational Resources in Computing (JERIC), 4(4), 1-20.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.