Home > Articles > All Issues > 2026 > Volume 14, No. 2, 2026 >
JOIG 2026 Vol.14(2):278-293
doi: 10.18178/joig.14.2.278-293

Video Anomaly Classification Using Convolutional Neural Network with Bidirectional Long Short-Term Memory Using Spatio-temporal Adaptive Transformer

Divya Uluvaru Hoovayya 1,*, Josephine Prem Kumar 2, and Heena Kousar 1
1. Department of Computer Science and Engineering, East Point College of Engineering and Technology, Visvesvaraya Technological University, Belagavi, Karnataka, India
2. Department of Computer Science and Engineering, Cambridge Institute of Technology, Visvesvaraya Technological University, Belagavi, Karnataka, India
Email: divyauhgopal@gmail.com (U.H.D.); d_prem_k@yahoo.com (J.P.M.); hkheenakousar73@gmail.com (H.K.)
*Corresponding author

Manuscript received October 6, 2025; revised December 8, 2025; accepted January 26, 2026; published April 28, 2026.

Abstract—In smart cities, surveillance systems are extensively deployed to monitor public areas and critical infrastructures. These continuous video streams are a valuable source for analyzing real-time activities and supporting tasks such as object detection, behavior analysis, and incident monitoring. Among these applications, detecting unusual or suspicious events is particularly crucial for ensuring safety and enabling timely responses to threats. Because of its importance in enhancing urban security, video anomaly detection has emerged as a central research focus in the broader field of intelligent video analysis. Thus, this research proposes a new integrated framework of Convolutional Neural Networks with Bidirectional Long Short-Term Memory using Spatio-Temporal Adaptive Transformer (CNN-BiLSTM-STAT) named for the classification of video anomalies in surveillance cameras. The proposed CNN-BiLSTM-STAT was implemented by using different datasets, such as the University of Central Florida Crime (UCF-Crime) dataset, Real-Life Violence Situations (RLVS) dataset, Extreme Dataset for Violence Detection (XD_Violence), and Real-World Fight 2000 Dataset (RFW- 2000). Then, image resizing and label encoding techniques are used in the preprocessing phase to improve the input data. Finally, the proposed integrated classifier CNNBiLSTM- STAT was used to classify video anomalies into multiple classes. The experimental results demonstrate that the proposed CNN-BiLSTM-STAT method attains an optimal accuracy of 99.91% on the UCF-Crime dataset compared to the existing methods, such as Recurrent Neural Networks with LSTM (RNN-LSTM) and MobileNet.

Keywords—bidirectional long short-term memory, convolutional neural network, spatio-temporal adaptive transformer, surveillance cameras, video anomalies

Cite: Divya Uluvaru Hoovayya, Josephine Prem Kumar, and Heena Kousar, "Video Anomaly Classification Using Convolutional Neural Network with Bidirectional Long Short-Term Memory Using Spatio-temporal Adaptive Transformer," Journal of Image and Graphics, Vol. 14, No. 2, pp. 278-293, 2026.

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC-BY-4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.

Article Metrics in Dimensions