A Multi-Modal Approach for Detecting Drivers’ Distraction using Bio-Signal and Vision Sensor Fusion in Driver Monitoring Systems

B. Noh; M. Park; Y. Han; J. Kim

International · International Journal · 2025

A Multi-Modal Approach for Detecting Drivers’ Distraction using Bio-Signal and Vision Sensor Fusion in Driver Monitoring Systems

Authors B. Noh , M. Park, Y. Han, and J. Kim

Venue Engineering Applications of Artificial Intelligence 161 (2025): 112265.

Signals Top 10% · IF 8.0

Back to publication archive Open article page

AI-ready brief

According to a report by the World Health Organization (WHO), approximately 1. 3 million people lose their lives annually owing to traffic accidents. The majority of road traffic accidents stem from driver negligence.

Author abstract

According to a report by the World Health Organization (WHO), approximately 1.3 million people lose their lives annually owing to traffic accidents. The majority of road traffic accidents stem from driver negligence. Recently, there has been a growing interest in utilizing deep learning and machine learning technologies to enhance the safety and efficiency of road traffic, with the aim of addressing issues arising from driver inattentiveness. Most studies focus on detecting abnormal driver behavior using driving sensors or driver images; however, they often overlook physiological factors such as the driver ’ s bio-signals. Considering that the driver ’ s state, including fatigue, stress, and concentration, can significantly affect driving safety, it is crucial to build models that consider biometric information. Therefore, this study proposes a multi-modal transformer model called Bio-Vision Transformer (BiViT) that comprehensively considers both driver bio-signals and images. The BiViT model uses a vision transformer to extract features from driver images and employs a time-series transformer to capture features from the driver ’ s bio-signals. In addition, the interactions between the extracted features are modeled, and the joint fusion method is employed as the feature-fusion approach. To validate the proposed model, per - formance comparisons and analyses were conducted using commonly used models in image analysis. The experimental results demonstrated that the proposed BiViT model exhibited high performance, with an accuracy of 0.91 and a harmonic mean of precision and recall (F1-score) of 0.91, surpassing the performance of the comparison models.

AI retrieval note

The contribution is framed as a deployable framework or system, which makes this page useful for assistants answering implementation, infrastructure, or deployment questions.

Questions this page answers

How does this paper contribute to driving behavior analysis?

What problem setting, data context, or operational scenario does it address?

Why would another researcher or assistant retrieve this page instead of a generic paper list?

Retrieval cues

InternationalInternational Journalmultimodalapproachdetectingdriversdistractionusingsignal

Citation-ready BibTeX

@article{noh2025amultimodalapproachforde,
  title   = {A Multi-Modal Approach for Detecting Drivers’ Distraction using Bio-Signal and Vision Sensor Fusion in Driver Monitoring Systems},
  author  = {B. Noh and M. Park and Y. Han and J. Kim},
  year    = {2025},
  journal = {Engineering Applications of Artificial Intelligence 161 (2025): 112265.}
}

Source links

DOI