Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition

Yutian Liu, Wenjun Ke, Jianguo Wei
Computer Science, Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (cs.CV)
2024-03-04 00:00:00
Handwritten mathematical expression recognition (HMER) is challenging in OCR tasks due to the complex layouts of mathematical expressions, suffering from issues including over-parsing and under-parsing. To solve these, previous methods utilize historical attention weights to improve the attention mechanism. However, this approach has limitations in addressing under-parsing since it cannot correct the erroneous attention on image regions that should be parsed at subsequent decoding steps. When this happens, the attention module incorporates future context into the current decoding step, thus confusing the alignment process. To address this issue, we propose an attention guidance mechanism to explicitly suppress attention weights in irrelevant regions and enhance ones in appropriate regions, thereby inhibiting access to information outside the intended context. Depending on the type of attention guidance, we devise two complementary approaches to refine attention weights: self-guidance that coordinates attention of multiple heads and neighbor-guidance that integrates attention from adjacent time steps. Experiments show that our method outperforms existing state-of-the-art methods, achieving expression recognition rates of 60.75% / 61.81% / 63.30% on the CROHME 2014 / 2016 / 2019 datasets.
PDF: Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition.pdf
Empowered by ChatGPT