IEEE/RJS International Conference on Intelligent RObots and Systems / 2021

Re-Attention Is All You Need: Memory-Efficient Scene Text Detection via Re-Attention on Uncertain Regions

Hsiang-Chun Chang, Hung-Jen Chen, Yuming Shen, Hong-Han Shuai, Wen-Huang Cheng

Computer VisionMultimodal LearningPopular and Landmark Papers

Scene text detection plays an important role on vision-based robot navigation to many potential landmarks such as nameplates, information signs, floor button in the elevators. Recently, scene text detection with segmentation-based methods has been receiving more and more attention. The segmentation results can be used to efficiently predict scene text of various shapes, such as irregular text in most scene text images. However, two kinds of texts remain unsolved: 1) tiny and 2) blurry instances. Moreover, the annotations for tiny/blurry texts are usually ignored during training, while tiny/blurry texts can still offer visual auxiliaries for robots to understand the world. Therefore, in this paper, we propose a new approach to effectively detect both clear and blurry texts. Specifically, we propose a re-attention module without increasing the learnable parameters, which first predicts the region of texts as the candidate region and leverages the same network to detect the candidate region again for reducing the required memory. Moreover, to avoid the errors from the first detection propagating to the re-attended area, we propose a new fusion module that learns to integrate the results of the re-attended regions and the first prediction. Experimental results manifest that the proposed method outperforms state-of-the-art methods on four challenging datasets.

3 citations0 influential

Full paper

Read the original paper

Source page

A direct open-access PDF is not available in the database yet. Use the source page or learning resources below to open the complete paper from the publisher or index.

Learning resources

Google Scholar referencesGoogle Scholar Papers with Code searchPapers with Code Semantic Scholar paper pageSemantic Scholar YouTube explanationsYouTube

Reading state

Discuss in ChatGPT

Uses your own ChatGPT account. The paper context is copied into a tutor prompt before ChatGPT opens.

Preview prompt

You are my AI/ML research paper instructor. I want to deeply understand the paper below.

First, teach it in layers:
1. One-paragraph intuition.
2. Problem statement and why it mattered.
3. Key method, architecture, or algorithm.
4. Important equations or mechanisms, explained intuitively.
5. Experiments and evidence.
6. Limitations, assumptions, and failure modes.
7. How this paper influenced later AI/ML/Deep Learning/GenAI work.
8. A 30-minute study plan with checkpoints.
9. Quiz me with 5 questions and wait for my answers.

When something is not available in the attached context, say what is missing and infer carefully.

### Paper attached as context
Title: Re-Attention Is All You Need: Memory-Efficient Scene Text Detection via Re-Attention on Uncertain Regions
Authors: Hsiang-Chun Chang, Hung-Jen Chen, Yuming Shen, Hong-Han Shuai, Wen-Huang Cheng
Year: 2021
Venue: IEEE/RJS International Conference on Intelligent RObots and Systems
Categories: Computer Vision, Multimodal Learning, Popular and Landmark Papers
Citations: 3
Paper URL: https://www.semanticscholar.org/paper/da027996ed65408abd9bac7ae371a83ea63e5c9b
Open PDF: Not available

Abstract:
Scene text detection plays an important role on vision-based robot navigation to many potential landmarks such as nameplates, information signs, floor button in the elevators. Recently, scene text detection with segmentation-based methods has been receiving more and more attention. The segmentation results can be used to efficiently predict scene text of various shapes, such as irregular text in most scene text images. However, two kinds of texts remain unsolved: 1) tiny and 2) blurry instances. Moreover, the annotations for tiny/blurry texts are usually ignored during training, while tiny/blurry texts can still offer visual auxiliaries for robots to understand the world. Therefore, in this paper, we propose a new approach to effectively detect both clear and blurry texts. Specifically, we propose a re-attention module without increasing the learnable parameters, which first predicts the region of texts as the candidate region and leverages the same network to detect the candidate region again for reducing the required memory. Moreover, to avoid the errors from the first detection propagating to the re-attended area, we propose a new fusion module that learns to integrate the results of the re-attended regions and the first prediction. Experimental results manifest that the proposed method outperforms state-of-the-art methods on four challenging datasets.

Learning resources:
- Google Scholar: Google Scholar references (https://scholar.google.com/scholar?q=Re-Attention%20Is%20All%20You%20Need%3A%20Memory-Efficient%20Scene%20Text%20Detection%20via%20Re-Attention%20on%20Uncertain%20Regions)
- Papers with Code: Papers with Code search (https://paperswithcode.com/search?q=Re-Attention%20Is%20All%20You%20Need%3A%20Memory-Efficient%20Scene%20Text%20Detection%20via%20Re-Attention%20on%20Uncertain%20Regions)
- Semantic Scholar: Semantic Scholar paper page (https://www.semanticscholar.org/paper/da027996ed65408abd9bac7ae371a83ea63e5c9b)
- YouTube: YouTube explanations (https://www.youtube.com/results?search_query=Re-Attention%20Is%20All%20You%20Need%3A%20Memory-Efficient%20Scene%20Text%20Detection%20via%20Re-Attention%20on%20Uncertain%20Regions+paper+explained)