Swin-Transformer Based Recognition of Diabetic Retinopathy Grade

Sanjay Gandhi Gundabatini*, Sai Sindhu Manne**, Sunkara Likhit Babu***, Vangapandu Bhargava Rao****, Sanka Tejaswi*****
*-***** Department of Computer Science and Engineering, Vasireddy Venkatadri Institute of Technology, Guntur, Andhra Pradesh, India.
Periodicity:January - June'2025
DOI : https://doi.org/10.26634/jpr.12.1.21927

Abstract

Diabetic Retinopathy (DR), a common diabetes-related disorder, is a leading driver of blindness worldwide. Quick detection and precise staging are essential for effective management and vision preservation. This study explores the Swin Transformer, an advanced deep learning framework with a multi-layered setup and a unique sliding window method, to create an automated tool for DR stage assessment. Utilizing the APTOS 2019 Blindness Detection dataset, the system accurately identifies small retinal signs like microaneurysms and more pronounced features such as hemorrhages, achieving high precision. Improved preprocessing, including image enrichment and calibration, enhances its versatility. Results indicate that this approach outperforms traditional Convolutional Neural Networks (CNNs) in precision, computational thrift, and growth potential, with a test accuracy of 99.57% and a test loss of 0.0220.

Keywords

Diabetic Retinopathy, Swin Transformer, Automated Staging, Retinal Analysis, Deep Learning Technology.

How to Cite this Article?

Gundabatini, S. G., Manne, S. S., Babu, S. L., Rao, V. B., and Tejaswi, S. (2025). Swin-Transformer Based Recognition of Diabetic Retinopathy Grade. i-manager’s Journal on Pattern Recognition, 12(1), 26-34. https://doi.org/10.26634/jpr.12.1.21927

References

[2]. Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1), 281-305.
[3]. Brownlee, J. (2019). Deep Learning for Computer Vision: Image Classification, Object Detection, and Face Recognition in Python. Machine Learning Mastery.
[4]. Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 801-818).
[9]. Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep Learning. MIT Press, Cambridge.
[15]. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10012-10022).
[18]. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 618-626).
[22]. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 1-11.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 15 15 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.