Contrastive Vision Transformer Combined with Hyperparameter Fine-Tuning and Interpretable AI for Glaucoma Assessment

General Information

ISSN: 2301-3699 (Print); 2972-3973 (Online)
Frequency: Bimonthly
Managing Editor: Ms. Inez Chan
DOI: 10.18178/joig
Abstracting/Indexing: Scopus (Since 2021), CNKI, Google Scholar, Crossref, etc.
APC: 500 USD
Average Days to Accept: 116 days
Acceptance Rate: 38%
E-mail: editor@joig.net
Journal Metrics:

Editor-in-Chief

Dr. Branislav Vuksanovic
Deputy Head of Department, Systems Engineering Department, Military Technological College, Muscat, Oman
I am very excited to serve as the first Editor-in-Chief of the International Journal of Image and Graphics (JOIG) and hope that the publication can enrich the readers’ experience... [Read More]

What's New

2026-06-04

The 2025 CiteScores have been released by Scopus. JOIG received the CiteScore 2025 with 4.3!

2026-04-30

Volume 14, No. 2 has been published now.

2026-02-27

Volume 14, No. 1 has been published now.

Home > Articles > All Issues > 2026 > Volume 14, No. 3, 2026 >

JOIG 2026 Vol.14(3):493-505
doi: 10.18178/joig.14.3.493-505

R.Roopalakshmi *, Ayush Amarnath Bhakat , and Sambhav Nath Jain

School of Computer Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, 576104, India
Email: roopalakshmi.r@manipal.edu (R.R.); ayush.mitmpl2022@learner.manipal.edu (A.A.B.); sambhav.mitmpl2022@learner.manipal.edu (S.N.J.)
*Corresponding author

Manuscript received October 13, 2025; revised November 20, 2025; accepted March 2, 2026; published June 17, 2026.

Abstract—Glaucoma, often known as the 'silent thief of sight', is a leading cause of irreversible blindness, which affects around 80 million people worldwide and hence its early and reliable detection is critical for preventing permanent vision loss. Although Vision Transformer (ViT)-based deep learning models are employed by existing techniques for automated glaucoma screening, yet, they primarily rely on conventional supervised training and single-dataset evaluation, which result in limited feature discrimination, suboptimal generalization across heterogeneous images, and poor clinical interpretability. To address these limitations, this study proposes a novel contrastive learning-optimized ViT framework, which integrates supervised contrastive pre-training with systematic hyperparameter optimization to learn more discriminative retinal features, and fine-tuning for glaucoma classification. In addition, a unified preprocessing and patch-based representation strategy is introduced to mitigate domain shifts across multiple imaging devices and acquisition protocols. Unlike prior studies using single benchmarks, this framework is validated on comprehensive multi-dataset setting combining six public fundus datasets (including G1020, ORIGA, REFUGE, PAPILA) to assess real-world generalization. Experimental results demonstrate consistent and statistically significant improvements over Convolutional Neural Network (CNN), baseline ViT models, in terms of achieving up to 87.91% accuracy and performance gains of 3-16% across accuracy, precision, recall, and F1score metrics. Further, Layer-wise Relevance Propagation (LRP) is employed to generate clinically interpretable heatmaps, which confirms that the model focuses on anatomically meaningful regions such as the optic disc and optic nerve head. These findings prove that the proposed framework provides robust, explainable, and generalizable solution for automated glaucoma screening and highlights its potential for clinical deployment.

Keywords—Glaucoma detection, vision transformers, contrastive learning, medical imaging, explainable AI, layer-wise relevance propagation, deep learning, convolutional neural network

Cite: R.Roopalakshmi, Ayush Amarnath Bhakat, and Sambhav Nath Jain, "Contrastive Vision Transformer Combined with Hyperparameter Fine-Tuning and Interpretable AI for Glaucoma Assessment," Journal of Image and Graphics, Vol. 14, No. 3, pp. 493-505, 2026.

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

附件说明

Article Metrics in Dimensions

PREVIOUS PAPER

Brain Fog Consortium Image Analysis Through Internet of Things Using Fog Computing in Stratus Cloud

NEXT PAPER

A Controlled Comparison of BiLSTM and Transformer Encoders for Arabic Handwritten Word Recognition

Home

Articles

Author Guide

Editor Guide

Reviewer Guide

Topics and Special Issues

journal menu