M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

Long Nguyen-Phuoc, Renald Gaboriau, Dimitri Delacroix, Laurent Navarro
Computer Science, Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (cs.CV), Multimedia (cs.MM), Sound (cs.SD), Audio and Speech Processing (eess.AS)
Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2 VISAPP: VISAPP, 869-876, 2024 , Rome, Italy
2024-03-14 00:00:00
This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.
PDF: M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment.pdf
Empowered by ChatGPT