Home > Articles > All Issues > 2026 > Volume 14, No. 1, 2026 >
JOIG 2026 Vol.14(1):121-140
doi: 10.18178/joig.14.1.121-140

Optimizing CNN Architectures for In-field Grape Disease Diagnosis on Mobile Embedded Systems

Shital Karande 1,2,* and Bindu Garg 1,*
1. Department of Computer Engineering, Bharati Vidyapeeth’s College of Engineering, Pune, India
2. Department of Computer Engineering, Bharati Vidyapeeth’s College of Engineering for Women, Pune, India
Email: shital.jadhav@bharatividyapeeth.edu (S.K.); brgarg@bvucoep.edu.in (B.G.)
*Corresponding author

Manuscript received June 26, 2025; revised July 25, 2025; accepted September 28, 2025; published February 27, 2026.

Abstract—Agriculture evolution involves practices to monitor leaf visual pattern to accurately diagnose plant diseases with computer vision solution. The diagnostics carried out in the field has complexity that is removed with proposed segmentation algorithm. Proposed approach is fusion of multi color threshold and Grab Cut suitable for light weight mobile devices. Mathematical principle in Convolutional Neural Network (CNN) has several variations of architectures. Because they have building blocks that process input image data in serial, parallel or sliced manner. This paper evaluates grape disease data-set with latest CNN models ranging from simple sequential to efficient parallel models. Different architecture design patterns explored in this research work to give birds eye view in deep learning utilization for image classification task. Eight different CNN architectures are based on different principles like parallel layers, skip connection, depth wise separable convolution, densely connected as well as sparsely connected layers evaluated in this research work. Training and testing carried out with commodity hardware and tested on portable devices. State of art architecture like Visual Geometry Group (VGG) 16 achieves accuracy of 0.98 for testing whereas Inception with parallel multi-layer filter model achieves comparatively less accuracy of 0.90. ResNet is the one of promising pretrained model giving accuracy of 0.9916 by learning from previous layers with skip connection. DenseNet architecture is an improvement over ResNet architecture with reduced model size using novel connectivity patterns. Latest Vision Transformer (ViT) gives best result with accuracy of 0.9942. This paper proposed eleven-layer architecture for efficient classification of grape crop diseases. Variation of architecture with input images size 128 and 160 are evaluated proving size 160 more suitable in comparison with other deep architectures. Proposed model is best in accuracy, F1-Score and recal1 of 1.0 for majority classes. It is light weight, has 3.6 MB size and is suitable for embedded systems.

Keywords—Convolutional Neural Network (CNN), Grab Cut, plant disease, embedded systems, parallel models, DenseNet

Cite: Shital Karande and Bindu Garg, "Optimizing CNN Architectures for In-field Grape Disease Diagnosis on Mobile Embedded Systems," Journal of Image and Graphics, Vol. 14, No. 1, pp. 121-140, 2026.

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC-BY-4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.

Article Metrics in Dimensions