IEEE International Conference on Acoustics, Speech, and Signal Processing / 2020

Attention Is All You Need In Speech Separation

Cem Subakan, M. Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong

Foundation ModelsLarge Language ModelsPopular and Landmark Papers

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.In this paper, we propose the SepFormer, a novel RNN-free Transformer-based neural network for speech separation. The Sep-Former learns short and long-term dependencies with a multi-scale approach that employs transformers. The proposed model achieves state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets. It reaches an SI-SNRi of 22.3 dB on WSJ0-2mix and an SI-SNRi of 19.5 dB on WSJ0-3mix. The SepFormer inherits the parallelization advantages of Transformers and achieves a competitive performance even when downsampling the encoded representation by a factor of 8. It is thus significantly faster and it is less memory-demanding than the latest speech separation systems with comparable performance.

756 citations91 influential

Full paper

Read the original paper

Open PDF Source page

Learning resources

arXiv PDFPDF arXiv abstract pagearXiv Google Scholar referencesGoogle Scholar Papers with Code searchPapers with Code YouTube explanationsYouTube

Reading state

Discuss in ChatGPT

Uses your own ChatGPT account. The paper context is copied into a tutor prompt before ChatGPT opens.

Preview prompt

You are my AI/ML research paper instructor. I want to deeply understand the paper below.

First, teach it in layers:
1. One-paragraph intuition.
2. Problem statement and why it mattered.
3. Key method, architecture, or algorithm.
4. Important equations or mechanisms, explained intuitively.
5. Experiments and evidence.
6. Limitations, assumptions, and failure modes.
7. How this paper influenced later AI/ML/Deep Learning/GenAI work.
8. A 30-minute study plan with checkpoints.
9. Quiz me with 5 questions and wait for my answers.

When something is not available in the attached context, say what is missing and infer carefully.

### Paper attached as context
Title: Attention Is All You Need In Speech Separation
Authors: Cem Subakan, M. Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong
Year: 2020
Venue: IEEE International Conference on Acoustics, Speech, and Signal Processing
Categories: Foundation Models, Large Language Models, Popular and Landmark Papers
Citations: 756
Paper URL: https://arxiv.org/abs/2010.13154v2
Open PDF: https://arxiv.org/pdf/2010.13154v2

Abstract:
Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.In this paper, we propose the SepFormer, a novel RNN-free Transformer-based neural network for speech separation. The Sep-Former learns short and long-term dependencies with a multi-scale approach that employs transformers. The proposed model achieves state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets. It reaches an SI-SNRi of 22.3 dB on WSJ0-2mix and an SI-SNRi of 19.5 dB on WSJ0-3mix. The SepFormer inherits the parallelization advantages of Transformers and achieves a competitive performance even when downsampling the encoded representation by a factor of 8. It is thus significantly faster and it is less memory-demanding than the latest speech separation systems with comparable performance.

Learning resources:
- PDF: arXiv PDF (https://arxiv.org/pdf/2010.13154v2)
- arXiv: arXiv abstract page (https://arxiv.org/abs/2010.13154v2)
- Google Scholar: Google Scholar references (https://scholar.google.com/scholar?q=Attention%20Is%20All%20You%20Need%20In%20Speech%20Separation)
- Papers with Code: Papers with Code search (https://paperswithcode.com/search?q=Attention%20Is%20All%20You%20Need%20In%20Speech%20Separation)
- YouTube: YouTube explanations (https://www.youtube.com/results?search_query=Attention%20Is%20All%20You%20Need%20In%20Speech%20Separation+paper+explained)