Home > Published Issues > 2023 > Volume 11, No. 4, December 2023 >
JOIG 2023 Vol.11(4): 367-375
doi: 10.18178/joig.11.4.367-375

Spatiotemporal Pyramidal CNN with Depth-Wise Separable Convolution for Eye Blinking Detection in the Wild

Nguy Thi Lan Anh 1, Nguyen Gia Bach 2, Nguyen Thi Thanh Tu 1, Eiji Kamioka 2, and Phan Xuan Tan 2,*
1. School of Engineering Pedagogy, Hanoi University of Science and Technology, Hanoi, Vietnam;
Email: anh.ntl202019@sis.hust.edu.vn (N.T.L.A.), tu.nguyenthithanh@hust.edu.vn (N. T. T. T.)
2. Graduate School of Engineering and Science, Shibaura Institute of Technology, Tokyo 135–8548, Japan; Email: mg21501@shibaura-it.ac.jp (N. G. B.), kamioka@shibaura-it.ac.jp (E. K.)
*Correspondence: tanpx@shibaura-it.ac.jp (P.X.T)

Manuscript received March 2, 2023; revised April 27, 2023; accepted May 8, 2023.

Abstract—Eye blinking detection in the wild plays an essential role in deception detection, driving fatigue detection, etc. Despite the fact that numerous attempts have already been made, the majority of them have encountered difficulties, such as the derived eye images having different resolutions as the distance between the face and the camera changes; or the requirement of a lightweight detection model to obtain a short inference time in order to perform in realtime. In this research, two problems are addressed: how the eye blinking detection model can learn efficiently from different resolutions of eye pictures in diverse conditions; and how to reduce the size of the detection model for faster inference time. We propose to utilize upsampling and downsampling the input eye images to the same resolution as one potential solution for the first problem, then find out which interpolation method can result in the highest performance of the detection model. For the second problem, although a recent spatiotemporal convolutional neural network used for eye blinking detection has a strong capacity to extract both spatial and temporal characteristics, it remains having a high number of network parameters, leading to high inference time. Therefore, using Depth-wise Separable Convolution rather than conventional convolution layers inside each branch is considered in this paper as a feasible solution.

Keywords—eye blinking, interpolation, facial landmarks, depth-wise separable convolution, 3D spatiotemporal Convolutional Neural Network (CNN), pyramid bottleneck block network

Cite: Nguy Thi Lan Anh, Nguyen Gia Bach, Nguyen Thi Thanh Tu, Eiji Kamioka, and Phan Xuan Tan, "Spatiotemporal Pyramidal CNN with Depth-Wise Separable Convolution for Eye Blinking Detection in the Wild," Journal of Image and Graphics, Vol. 11, No. 4, pp. 367-375, December 2023.

Copyright © 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.