Director

Journal and conference on speech
CCF-A NeuraIPS   AAAI   IJAI   ACMMM
CCF-B ICASSP   COLING   SpeechCom   TSLP   TASLP   JSLHR   TMM   TOMCCAP   ICME
CCF-C INTERSPEECH   ICPR
other ICLR
Hybrid & General ASR

2022

1 Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection pdf
2 Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition pdf
3 Internal language model estimation through explicit context vector learning for attention-based encoder-decoder ASR pdf
4 Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition pdf
5 Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character pdf
6 Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition pdf
7 Human and Automatic Speech Recognition Performance on German Oral History Interviews pdf
8 Recent Progress in the CUHK Dysarthric Speech Recognition System pdf
9 The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition pdf
10 Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR pdf
11 Ask2Mask: Guided Data Selection for Masked Speech Modeling pdf
12 The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition pdf
13 Non-Autoregressive ASR with Self-Conditioned Folded Encoders pdf
14 MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition pdf
15 Conversational Speech Recognition By Learning Conversation-level Characteristics pdf
16 The RoyalFlush System of Speech Recognition for M2MeT Challenge pdf
17 Visual Speech Recognition for Multiple Languages in the Wild pdf
18 Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments pdf
19 Wav2Vec2.0 on the Edge: Performance Evaluation pdf
20 4-bit Conformer with Native Quantization Aware Training for Speech Recognition pdf
21 A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings pdf
22 Chain-based Discriminative Autoencoders for Speech Recognition pdf
23 CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR pdf
24 Enhancing Speech Recognition Decoding via Layer Aggregation pdf
25 Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR pdf
26 Locality Matters: A Locality-Biased Linear Attention for Automatic Speech Recognition pdf
27 Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR pdf
28 Similarity and Content-based Phonetic Self Attention for Speech Recognition pdf
29 Speaker recognition by means of a combination of linear and nonlinear predictive models pdf
30 STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation pdf
31 Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings pdf
32 Transformer-based Streaming ASR with Cumulative Attention pdf
33 Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models pdf
34 Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition pdf
35 Improved far-field speech recognition using Joint Variational Autoencoder pdf
36 E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR pdf
37 Self-critical Sequence Training for Automatic Speech Recognition pdf
38 3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition pdf
39 A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition pdf
40 Text-To-Speech Data Augmentation for Low Resource Speech Recognition pdf
41 Multiple Confidence Gates For Joint Training Of SE And ASR pdf
42 End-to-End Multi-speaker ASR with Independent Vector Analysis pdf
43 Filter-based Discriminative Autoencoders for Children Speech Recognition pdf
44 Global Normalization for Streaming Speech Recognition in a Modular Framework pdf
45 Heterogeneous Reservoir Computing Models for Persian Speech Recognition pdf
46 PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit pdf
47 Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition pdf
48 Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator pdf
49 Unified Modeling of Multi-Domain Multi-Device ASR Systems pdf
50 Conformer with dual-mode chunked attention for joint online and offline ASR pdf
51 Context-based out-of-vocabulary word recovery for ASR systems in Indian languages pdf
52 Improving the Training Recipe for a Robust Conformer-based Hybrid Model pdf
53 Nextformer: A ConvNeXt Augmented Conformer For End-To-End Speech Recognition pdf
54 Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition pdf
55 Squeezeformer: An Efficient Transformer for Automatic Speech Recognition pdf
56 Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training pdf
57 Learning a Dual-Mode Speech Recognition Model via Self-Pruning pdf
58 Improving Mandarin Speech Recogntion with Block-augmented Transformer pdf
59 Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities pdf
60 Online Continual Learning of End-to-End Speech Recognition Models pdf
61 Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder pdf
62 Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies pdf
63 Compute Cost Amortized Transformer for Streaming ASR pdf
64 Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition pdf
65 Comparison and Analysis of New Curriculum Criteria for End-to-End ASR pdf
66 Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition pdf
67 Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition pdf
68 Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition pdf
69 Attention Enhanced Citrinet for Speech Recognition pdf
70 Deep Sparse Conformer for Speech Recognition pdf

2021

1 The History of Speech Recognition to the Year 2030 pdf
2 Multilingual Speech Recognition using Knowledge Transfer across Learning Processes pdf
3 Efficient domain adaptation of language models in ASR systems using Prompt-tuning pdf
4 Word Order Does Not Matter For Speech Recognition pdf
5 Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition pdf
6 Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition pdf
7 Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets pdf
8 A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming pdf
9 AequeVox: Automated Fairness Testing of Speech Recognition Systems pdf
10 An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition pdf
11 synchronous Decentralized Distributed Training of Acoustic Models pdf
12 Beyond Lp clipping: Equalization-based Psychoacoustic Attacks against ASRs pdf
13 Continual learning using lattice-free MMI for speech recognition pdf
14 Explaining the Attention Mechanism of End-to-End Speech Recognition Using Decision Trees pdf
15 Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training pdf
16 FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition pdf
17 improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Model pdf
18 Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition pdf
19 Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask pdf
20 Integrating Categorical Features in End-to-End ASR pdf
21 Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition pdf
22 Multi-Modal Pre-Training for Automated Speech Recognition pdf
23 Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet pdf
24 Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding pdf
25 Parallel Composition of Weighted Finite-State Transducers pdf
26 SCaLa: Supervised Contrastive Learning for End-to-End Automatic Speech Recognition pdf
27 Speech Pattern based Black-box Model Watermarking for Automatic Speech Recognition pdf
28 Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning pdf
29 Spell my name: keyword boosted speech recognition pdf
30 Towards efficient end-to-end speech recognition with biologically-inspired neural networks pdf
31 Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR pdf
32 Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition pdf
33 Word Order Does Not Matter For Speech Recognition pdf
34 Voice Conversion Can Improve ASR in Very Low-Resource Settings pdf
35 Towards Building ASR Systems for the Next Billion Users pdf
36 Scaling ASR Improves Zero and Few Shot Learning pdf
37 Romanian Speech Recognition Experiments from the ROBIN Project pdf
38 Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition pdf
39 Recent Advances in End-to-End Automatic Speech Recognition pdf
40 Privacy attacks for automatic speech recognition acoustic models in a federated learning framework pdf
41 Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature pdf
42 Mixed Precision DNN Qunatization for Overlapped Speech Separation and Recognition pdf
43 Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition pdf
44 Effect of noise suppression losses on speech distortion and ASR performance pdf
45 Do We Still Need Automatic Speech Recognition for Spoken Language Understanding? pdf
46 Conformer-based Hybrid ASR System for Switchboard Dataset pdf
47 A comparison of streaming models and data augmentation methods for robust speech recognition pdf
48 Are E2E ASR models ready for an industrial usage? pdf
49 Voice Quality and Pitch Features in Transformer-Based Speech Recognition pdf
50 Investigation of Densely Connected Convolutional Networks with Domain Adversarial Learning for Noise Robust Speech Recognition pdf
51 Continual Learning for Monolingual End-to-End Automatic Speech Recognition pdf
52 Domain Prompts: Towards memory and compute efficient domain adaptation of ASR systems pdf
53 Speech frame implementation for speech analysis and recognition pdf
54 Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN pdf
55 Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech pdf
56 Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems pdf
57 Training end-to-end speech-to-text models on mobile phones pdf
58 Robust Speech Representation Learning via Flow-based Embedding Regularization pdf
59 A Mixture of Expert Based Deep Neural Network for Improved ASR pdf
60 A higher order Minkowski loss for improved prediction ability of acoustic model in ASR pdf
61 X-Vector based voice activity detection for multi-genre broadcast speech-to-text pdf

2020

1 On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition pdf
2 Conformer: Convolution-augmented Transformer for Speech Recognition pdf
3 ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context pdf
4 Improved Noisy Student Training for Automatic Speech Recognition( pdf
5 CIF: Continuous Integrate-And-Fire for End-To-End Speech Recognition pdf
6 A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition pdf
7 Imputer: Sequence modelling via imputation and dynamic programming pdf
8 Automatic Speech Recognition Errors Detection and Correction: A Review pdf
9 A review of on-device fully neural end-to-end automatic speech recognition algorithms pdf

2018

1 Accelerating recurrent neural network language model based online speech recognition system pdf
2 Towards Language-Universal End-to-End Speech Recognition pdf

2017

1 Reducing Bias in Production Speech Models pdf
2 Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition pdf
RNN-T

2022

1 Improving the fusion of acoustic and text representations in RNN-T pdf
2 A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies pdf
3 A Likelihood Ratio based Domain Adaptation Method for E2E Models pdf
4 Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models pdf
5 Memory-Efficient Training of RNN-Transducer with Sampled Softmax pdf
6 Streaming parallel transducer beam search with fast-slow cascaded encoders pdf
7 Efficient Training of Neural Transducer for Speech Recognition pdf
8 An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition pdf
9 A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes pdf
10 On the Prediction Network Architecture in RNN-T for ASR pdf
11 Pruned RNN-T for fast, memory-efficient ASR training pdf
12 Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer pdf
13 Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition pdf
14 Composing RNNs and FSTs for Small Data: Recovering Missing Characters in Old Hawaiian Text pdf
15 VQ-T: RNN Transducers using Vector-Quantized Prediction Network States pdf
16 ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition pdf
17 Streaming Target-Speaker ASR with Neural Transducer pdf

2021

1 Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer pdf
2 Streaming End-to-End Multi-Talker Speech Recognition pdf
3 A Better and Faster End-to-End Model for Streaming ASR pdf
4 Tied & Reduced RNN-T Decoder pdf
5 Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices pdf
6 Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter pdf
7 On Language Model Integration for RNN Transducer based Speech Recognition pdf
8 A Unified Speaker Adaptation Approach for ASR pdf
9 Factorized Neural Transducer for Efficient Language Model Adaptation pdf
10 Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition pdf
11 Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models pdf
12 On Language Model Integration for RNN Transducer based Speech Recognition pdf
13 Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution pdf
14 Word-level confidence estimation for RNN transducers pdf
15 Sequence Transduction with Graph-based Supervision pdf
16 Joint AEC AND Beamforming with Double-Talk Detection using RNN-Transformer pdf
17 Context-Aware Transformer Transducer for Speech Recognition pdf
18 Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding pdf
19 Multi-turn RNN-T for streaming recognition of multi-party speech pdf
20 Investigation of Training Label Error Impact on RNN-T pdf

2020

1 RNN-T For Latency Controlled ASR With Improved Beam Search pdf
2 Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And RNN-T Loss pdf
3 A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency pdf
4 Towards Fast And Accurate Streaming E2E ASR pdf
5 Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-end Speech Recognition pdf
6 Transfer Learning Approaches for Streaming End-to-End Speech Recognition System pdf
7 Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer pdf
8 Alignment Restricted Streaming Recurrent Neural Network Transducer pdf
9 Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR pdf
10 Improving RNN transducer with normalized jointer network pdf
11 Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer pdf
12 Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data pdf
13 FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization pdf
14 Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer pdf
15 Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition pdf

2019

1 Self-Attention Transducers for End-to-End Speech Recognition pdf

2018

1 Streaming E2E Speech Recognition For Mobile Devices pdf
CTC

2022

1 Improved Mispronunciation detection system using a hybrid CTC-ATT based approach for L2 English speakers pdf
2 Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer pdf
3 Improving CTC-based speech recognition via knowledge transferring from pre-trained language models pdf
4 Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding pdf
5 Adding Connectionist Temporal Summarization into Conformer to Improve Its Decoder Efficiency For Speech Recognition pdf
6 Better Intermediates Improve CTC Inference pdf
7 Multi-sequence Intermediate Conditioning for CTC-based ASR pdf
8 InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR pdf
9 Improving CTC-based ASR Models with Gated Interlayer Collaboration pdf
10 A CTC Triggered Siamese Network with Spatial-Temporal Dropout for Speech Recognition pdf
11 Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM pdf
12 Distilling the Knowledge of BERT for CTC-based ASR pdf

2021

1 Why does CTC result in peaky behavior? pdf
2 Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input pdf
3 CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition pdf
4 Improved Mask-CTC for Non-Autoregressive End-to-End ASR pdf
5 An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR pdf
6 Back from the future: bidirectional CTC decoding using future information in speech recognition pdf
7 CTC Variations Through New WFST Topologies pdf
8 Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units pdf

2020

1 Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict pdf

2019

1 Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition pdf

2018

1 An improved hybrid CTC-Attention model for speech recognition pdf

2017

1 Residual Convolutional CTC Networks for Automatic Speech Recognition pdf
2 Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling pdf
AED

2022

1 Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR pdf
2 USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder pdf
3 Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems pdf
4 Supervised Attention in Sequence-to-Sequence Models for Speech Recognition pdf
5 LegoNN: Building Modular Encoder-Decoder Models pdf

2021

1 SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition pdf
2 K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables pdf
3 SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition pdf
4 Attention based end to end Speech Recognition for Voice Search in Hindi and English pdf
5 A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation pdf
6 Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI pdf

2020

1 Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition pdf
2 High Performance Sequence-to-Sequence Model for Streaming Speech Recognition pdf
3 Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition pdf
4 Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory pdf
5 CTC-synchronous Training for Monotonic Attention Model pdf
6 Low Latency End-to-End Streaming Speech Recognition with a Scout Network pdf
7 Synchronous Transformers For E2E Speech Recognition pdf
8 Transformer Online CTC/Attention E2E Speech Recognition Architecture pdf
9 Streaming Automatic Speech Recognition With The Transformer Model pdf
10 Minimum Latency Training Strategies For Streaming seq-to-seq ASR pdf
11 Enhancing Monotonic Multihead Attention for Streaming ASR pdf
12 Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition pdf
13 Insertion-Based Modeling for End-to-End Automatic Speech Recognition pdf
14 Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition pdf
15 Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition pdf
16 Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer pdf

2019

1 Streaming Transformer ASR with Blockwise Synchronous Inference pdf
2 Triggered Attention for End-to-End Speech Recognition pdf
3 Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition pdf
4 Spelling Correction Model For E2E Speech Recognition pdf
5 An Empirical Study Of Efficient ASR Rescoring With Transformers pdf
6 Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model pdf

2018

1 State-of-the-art Speech Recognition With Sequence-to-Sequence Models pdf
2 Montonic Chunkwise Attention pdf

2017

1 Multilingual Speech Recognition With A Single End-To-End Model pdf
2 Attention-Based End-to-End Speech Recognition in Mandarin pdf
3 Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping pdf

2016

1 Wav2Letter: an End-to-End ConvNet-based Speech Recognition System pdf

2015

1 Listen, attend and spell: A neural network for large vocabulary conversational speech recognition pdf
Unified & Rescoring

2022

1 Two-Pass End-to-End ASR Model Compression pdf
2 Korean Tokenization for Beam Search Rescoring in Speech Recognition pdf
3 WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit pdf
4 RescoreBERT: Discriminative Speech Recognition Rescoring with BERT pdf
5 On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode pdf
6 Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems pdf

2021

1 Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition pdf
2 One In A Hundred: Select The Best Predicted Sequence from Numerous Candidates for Streaming Speech Recognition pdf
3 Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition pdf
4 WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit pdf
5 U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition pdf
6 An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR pdf
7 ASR Rescoring and Confidence Estimation with ELECTRA pdf
8 Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition pdf
9 Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes pdf
10 Lattention: Lattice-attention in ASR rescoring pdf
11 Improving Hybrid CTC/Attention End-to-end Speech Recognition with Pretrained Acoustic and Language Model pdf
12 GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio pdf
13 ASR Rescoring and Confidence Estimation with ELECTRA pdf
14 Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition pdf
15 Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes pdf
16 Lattention: Lattice-attention in ASR rescoring pdf

2020

1 Transformer Transducer: One Model Unifying Streaming And Non-Streaming Speech Recognition pdf
2 Universal ASR: Unify And Improve Streaming ASR With Full-Context Modeling pdf
3 Cascaded encoders for unifying streaming and non-streaming ASR pdf
4 Dynamic latency speech recognition with asynchronous revision pdf
5 Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition pdf

2018

1 Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units pdf

2017

1 Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM pdf
Data Aug

2022

1 Investigation of Data Augmentation Techniques for Disordered Speech Recognition pdf
2 LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects pdf
3 Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children's Speech pdf
4 Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations pdf
5 Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition pdf
6 Investigating Lexical Replacements for Arabic-English Code-Switched Data Augmentation pdf
7 Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition pdf
8 Improving Data Driven Inverse Text Normalization using Data Augmentation pdf
9 Data Augmentation for Low-Resource Quechua ASR Improvement pdf
10 Non-Parallel Voice Conversion for ASR Augmentation pdf

2021

1 MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition pdf
2 Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition pdf
3 Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition pdf
4 Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition pdf
5 Data Augmentation for Speech Recognition in Maltese: A Low-Resource Perspective pdf
6 Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition pdf
7 PM-MMUT: Boosted Phone-mask Data Augmentation using Multi-modeing Unit Training for Robust Uyghur E2E Speech Recognition pdf

2020

1 pdf

2019

1 SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition pdf
LM

2022

1 Neural-FST Class Language Model for End-to-End Speech Recognition pdf
2 Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model pdf
3 Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR pdf
4 Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers pdf
5 A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling pdf
6 An Empirical Study of Language Model Integration for Transducer based Speech Recognition pdf
7 Improving Speech Recognition for Indic Languages using Language Model pdf
8 Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition pdf
9 Detecting Unintended Memorization in Language-Model-Fused ASR pdf
10 Improving Rare Word Recognition with LM-aware MWER Training pdf
11 Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems pdf
12 Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR Systems pdf
13 Distilling a Pretrained Language Model to a Multilingual ASR Model pdf
14 Residual Language Model for End-to-end Speech Recognition pdf
15 ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks pdf
16 Bayesian Neural Network Language Modeling for Speech Recognition pdf
17 Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models pdf
18 SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data pdf

2021

1 Private Language Model Adaptation for Speech Recognition pdf
2 Disambiguation-BERT for N-best Rescoring in Low-Resource Conversational ASR pdf
3 Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition pdf
4 Learning Domain Specific Language Models for Automatic Speech Recognition through Machine Translation pdf
5 Private Language Model Adaptation for Speech Recognition pdf
6 ViraPart: A Text Refinement Framework for ASR and NLP Tasks in Persian pdf
7 Conversational speech recognition leveraging effective fusion methods for cross-utterance language modeling pdf
8 Mixed Precision of Quantization of Transformer Language Models for Speech Recognition pdf
9 Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition pdf
Unsupervised

2022

1 A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition pdf
2 Robust Self-Supervised Audio-Visual Speech Recognition pdf
3 Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction pdf
4 IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion pdf
5 Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition pdf
6 Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition pdf
7 Self-supervised Learning with Random-projection Quantizer for Speech Recognition pdf
8 The CORAL++ Algorithm for Unsupervised Domain Adaptation of Speaker Recogntion pdf
9 Autoregressive Co-Training for Learning Discrete Speech Representations pdf
10 Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks pdf
11 Learning Audio Representations with MLPs pdf
12 Privacy-Preserving Speech Representation Learning using Vector Quantization pdf
13 Probing phoneme, language and speaker information in unsupervised speech representations pdf
14 TRILLsson: Distilled Universal Paralinguistic Speech Representations pdf
15 XTREME-S: Evaluating Cross-lingual Speech Representations pdf
16 A Brief Overview of Unsupervised Neural Speech Representation Learning pdf
17 Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition pdf
18 Audio Self-supervised Learning: A Survey pdf
19 Federated Domain Adaptation for ASR with Full Self-Supervision pdf
20 Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment pdf
21 Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition pdf
22 Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition pdf
23 LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT pdf
24 Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data pdf
25 Towards Representative Subset Selection for Self-Supervised Speech Recognition pdf
26 Unsupervised Word Segmentation using K Nearest Neighbors pdf
27 Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training pdf
28 Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? pdf
29 Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation pdf
30 ATST: Audio Representation Learning with Teacher-Student Transformer pdf
31 Improving Self-Supervised Speech Representations by Disentangling Speakers pdf
32 BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations pdf
33 HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition pdf
34 Can Self-Supervised Learning solve the problem of child speech recognition? pdf
35 Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction pdf
36 Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning pdf
37 Federated Self-supervised Speech Representations: Are We There Yet? pdf
38 Towards End-to-end Unsupervised Speech Recognition pdf
39 Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation pdf
40 Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective pdf
41 Unsupervised Data Selection via Discrete Speech Representation for ASR pdf
42 Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices pdf
43 Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition pdf
44 A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems pdf
45 Contrastive Siamese Network for Semi-supervised Speech Recognition pdf
46 Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR pdf
47 Deploying self-supervised learning in the wild for hybrid automatic speech recognition pdf
48 Self-Supervised Speech Representation Learning: A Review pdf
49 SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization pdf
50 Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing pdf
51 Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages pdf
52 Boosting Cross-Domain Speech Recognition with Self-Supervision pdf
53 Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training pdf
54 Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training pdf
55 DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR pdf
56 FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition pdf
57 Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition pdf
58 Joint Encoder-Decoder Self-Supervised Pre-training for ASR pdf
59 Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models pdf
60 Wav2Vec-Aug: Improved self-supervised training with limited data pdf
61 Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network pdf
62 Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge pdf
63 Unsupervised data selection for Speech Recognition with contrastive loss ratios pdf
64 Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data pdf
65 Improving Low-Resource Speech Recognition with Pretrained Speech Models: Continued Pretraining vs. Semi-Supervised Training pdf
66 FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning pdf
67 Thai Wav2Vec2.0 with CommonVoice V8 pdf
68 Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning pdf
69 Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models pdf
70 Unsupervised domain adaptation for speech recognition with unsupervised error correction pdf
71 An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning pdf
72 Applying wav2vec2 for Speech Recognition on Bengali Common Voices Dataset pdf

2021

1 Private Language Model Adaptation for Speech Recognition pdf
2 Analyzing the Robustness of Unsupervised Speech Recognition pdf
3 Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition pdf
4 Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning pdf
5 Wav2vec-S: Semi-Supervised Pre-Training for Speech Recognition pdf
6 WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing pdf
7 Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses pdf
8 Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages pdf
9 Self-Supervised Learning for speech recognition with Intermediate layer supervision pdf
Multilingual

2022

1 Reducing language context confusion for end-to-end code-switching automatic speech recognition pdf
2 Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition pdf
3 Data and knowledge-driven approaches for multilingual training to improve the performance of speech recognition systems of Indian languages pdf
4 A Survey of Multilingual Models for Automatic Speech Recognition pdf
5 Code Switched and Code Mixed Speech Recognition for Indic languages pdf
6 Frequency-Directional Attention Model for Multilingual Automatic Speech Recognition pdf
7 Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition pdf
8 Adaptive Activation Network For Low Resource Multilingual Speech Recognition pdf
9 Bilingual End-to-End ASR with Byte-Level Subwords pdf
10 LAE: Language-Aware Encoder for Monolingual and Multilingual ASR pdf
11 Language-specific Characteristic Assistance for Code-switching Speech Recognition pdf
12 Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition pdf
13 Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition pdf
14 A Language Agnostic Multilingual Streaming On-Device ASR System pdf
15 Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation pdf
16 Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification pdf
17 Learning ASR pathways: A sparse multilingual ASR model pdf
18 Multilingual Transformer Language Model for Speech Recognition in Low-resource Languages pdf
19 ASR2K: Speech Recognition for Around 2000 Languages without Audio pdf

2021

1 GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio pdf
2 Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0 pdf
3 Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models pdf
4 Minimum word error training for non-autoregressive Transformer-based code-switching ASR pdf
5 Multilingual Speech Recognition using Knowledge Transfer across Learning Processes pdf
6 Joint Unsupervised and Supervised Training for Multilingual ASR pdf
7 Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization pdf
8 Bilingual Speech Recognition by Estimating Speaker Geometry from Video Data pdf
9 Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching pdf
10 Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition pdf
Personal

2022

1 ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users pdf
2 Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition pdf
3 Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models pdf
4 End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system pdf
5 Curriculum optimization for low-resource speech recognition pdf
6 Enhancing ASR for Stuttered Speech with Limited Data Using Detect and Pass pdf
7 Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding pdf
8 Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition pdf
9 PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations pdf
10 Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition pdf
11 Speaker adaptation for Wav2vec2 based dysarthric ASR pdf
12 Contextual Adapters for Personalized Speech Recognition in Neural Transducers pdf
13 Adaptive multilingual speech recognition with pretrained models pdf
14 A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data pdf
15 Confidence Score Based Conformer Speaker Adaptation for Speech Recognition pdf

2021

1 GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio pdf
2 Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition pdf
3 Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets pdf
4 Personalizing ASR with limited data using targeted subset selection pdf
5 Prompt-tuning in ASR systems for efficient domain-adaptation pdf
Accent

2022

1 Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition pdf
2 Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition pdf
3 Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents pdf
4 Cleanformer: A microphone array configuration-invariant, streaming, multichannel neural enhancement frontend for ASR pdf
5 Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data pdf
6 A Highly Adaptive Acoustic Model for Accurate Multi-Dialect Speech Recognition pdf
7 Performance Disparities Between Accents in Automatic Speech Recognition pdf

2021

1 GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio pdf
2 Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings pdf
3 Multi-Dialect Arabic Speech Recognition pdf
Dataset

2022

1 The Norwegian Parliamentary Speech Corpus pdf
2 CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition pdf
3 Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset pdf
4 Finnish Parliament ASR corpus - Analysis, benchmarks and statistics pdf
5 Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks pdf
6 Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset pdf
7 GigaST: A 10,000-hour Pseudo Speech Translation Corpus pdf
8 GWA: A Large High-Quality Acoustic Dataset for Audio Processing pdf
9 SDS-200: A Swiss German Speech to Standard German Text Corpus pdf
10 Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi pdf
11 Bengali Common Voice Speech Dataset for Automatic Speech Recognition pdf
12 TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline pdf
13 The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition pdf
14 Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition pdf
15 UserLibri: A Dataset for ASR Personalization Using Only Text pdf

2021

1 GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio pdf
2 Building a Noisy Audio Dataset to Evaluate Machine Learning Approaches for Automatic Speech Recognition Systems pdf
3 CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese pdf
4 WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition pdf
5 Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions pdf
6 The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage pdf
7 JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification pdf
Robust

2022

1 A Conformer Based Acoustic Model for Robust Automatic Speech Recognition pdf
2 Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition pdf
3 Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data pdf
4 RED-ACE: Robust Error Detection for ASR using Confidence Embeddings pdf
5 Speech-enhanced and Noise-aware Networks for Robust Speech Recognition pdf
6 Mask scalar prediction for improving robust automatic speech recognition pdf
7 Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning pdf
8 Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection pdf
9 Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition pdf
10 Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping pdf
11 ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding pdf
12 pMCT: Patched Multi-Condition Training for Robust Speech Recognition pdf
13 DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition pdf
14 Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition pdf

2021

1 Robustifying automatic speech recognition by extracting slowly varying features pdf
2 Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR pdf
3 Sequential Randomized Smoothing for Adversarially Robust Speech Recognition pdf
Speaker Diarization

2022

1 ASR-Aware End-to-end Neural Diarization pdf
2 Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge pdf
3 The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge pdf
4 EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers pdf
5 Multi-scale Speaker Diarization with Dynamic Scale Weighting pdf
6 Multi-Target Filter and Detector for Speaker Diarization pdf
7 Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios pdf
8 Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization pdf
9 Robust End-to-end Speaker Diarization with Generic Neural Clustering pdf
10 Self-supervised Speaker Diarization pdf
11 From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization pdf
12 Multimodal Clustering with Role Induced Constraints for Speaker Diarization pdf
13 Bi-LSTM Scoring Based Similarity Measurement with Agglomerative Hierarchical Clustering (AHC) for Speaker Diarization pdf
14 PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification pdf
15 Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization pdf
16 Online Neural Diarization of Unlimited Numbers of Speakers pdf
17 Utterance-by-utterance overlap-aware neural diarization with Graph-PIT pdf
18 Online Target Speaker Voice Activity Detection for Speaker Diarization pdf
19 Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription pdf
20 Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones pdf
21 Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization pdf
22 Chronological Self-Training for Real-Time Speaker Diarization pdf
23 Robust Acoustic Domain Identification with its Application to Speaker Diarization pdf
24 Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting pdf
MultiChannel

2022

1 Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge pdf
2 The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge pdf
3 Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge pdf
4 The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge pdf
5 Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study pdf
MultiModal

2022

1 Improved Meta Learning for Low Resource Speech Recognition pdf
2 A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection pdf
3 End-to-End Multi-Person Audio/Visual Automatic Speech Recognition pdf
4 AVATAR: Unconstrained Audiovisual Speech Recognition pdf
5 Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition pdf
6 Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial Alignment pdf
7 SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning pdf
8 Towards Generalisable Audio Representations for Audio-Visual Navigation pdf
9 Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition pdf
10 Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands pdf
11 Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR pdf
12 Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception pdf
Speech translation

2022

1 Who Are We Talking About? Handling Person Names in Speech Translation pdf
2 Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation pdf
3 Efficient yet Competitive Speech Translation: FBK@IWSLT2022 pdf
4 Cross-modal Contrastive Learning for Speech Translation pdf
5 ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks pdf
6 Non-Parametric Domain Adaptation for End-to-End Speech Translation pdf
7 On the Impact of Noises in Crowd-Sourced Data for Speech Translation pdf
8 Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation pdf
9 Revisiting End-to-End Speech-to-Text Translation From Scratch pdf
10 The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task pdf
11 M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation pdf
12 A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation pdf
13 Direct Speech Translation for Automatic Subtitling pdf
Other

2022

1 Endpoint Detection for Streaming End-to-End Multi-talker ASR pdf
2 How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR pdf
3 Comparative Study of Acoustic Echo Cancellation Algorithms for Speech Recognition System in Noisy Environment pdf
4 Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition pdf
5 Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition pdf
6 Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection pdf
7 Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged Speech Recognition pdf
8 Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey pdf
9 VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition pdf
10 Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition pdf
11 ASRPU: A Programmable Accelerator for Low-Power Automatic Speech Recognition pdf
12 A two-step approach to leverage contextual data: speech recognition in air-traffic communications pdf
13 Semantic-aware Speech to Text Transmission with Redundancy Removal pdf
14 Joint Speech Recognition and Audio Captioning pdf
15 Error Correction in ASR using Sequence-to-Sequence Models pdf
16 Visualizing Automatic Speech Recognition -- Means for a Better Understanding? pdf
17 BEA-Base: A Benchmark for ASR of Spontaneous Hungarian pdf
18 Language Dependencies in Adversarial Attacks on Speech Recognition Systems pdf
19 Analysis of EEG frequency bands for Envisioned Speech Recognition pdf
20 Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition Systems pdf
21 Automatic Speech recognition for Speech Assessment of Preschool Children pdf
22 Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis pdf
23 Computing Optimal Location of Microphone for Improved Speech Recognition pdf
24 Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition pdf
25 Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition pdf
26 How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications pdf
27 Impact of Dataset on Acoustic Models for Automatic Speech Recognition pdf
28 indic-punct: An automatic punctuation restoration and inverse text normalization framework for Indic languages pdf
29 Integrate Lattice-Free MMI into End-to-End Speech Recognition pdf
30 Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages? pdf
31 Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training pdf
32 Mel Frequency Spectral Domain Defenses against Adversarial Attacks on Speech Recognition Systems pdf
33 Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data pdf
34 Neural Predictor for Black-Box Adversarial Attacks on Speech Recognition pdf
35 Recent improvements of ASR models in the face of adversarial attacks pdf
36 Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors pdf
37 Seq-2-Seq based Refinement of ASR Output for Spoken Name Capture pdf
38 Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator pdf
39 Towards Privacy-Preserving Speech Representation for Client-Side Data Sharing pdf
40 Vakyansh: ASR Toolkit for Low Resource Indic languages pdf
41 Disappeared Command: Spoofing Attack On Automatic Speech Recognition Systems with Sound Masking pdf
42 Extracting Targeted Training Data from ASR Models, and How to Mitigate It pdf
43 ASR in German: A Detailed Error Analysis pdf
44 Unified Speech-Text Pre-training for Speech Translation and Recognition pdf
45 Building an ASR Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data pdf
46 Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners pdf
47 Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser pdf
48 Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition pdf
49 Successes and critical failures of neural networks in capturing human-like speech recognition pdf
50 Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition pdf
51 End-to-end multi-talker audio-visual ASR using an active speaker attention module pdf
52 End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation pdf
53 Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition pdf
54 An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech pdf
55 FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech pdf
56 Content-Context Factorized Representations for Automated Speech Recognition pdf
57 Insights on Neural Representations for End-to-End Speech Recognition pdf
58 Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments pdf
59 SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation pdf
60 Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection pdf
61 Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech pdf
62 Hearing voices at the National Library -- a speech corpus and acoustic model for the Swedish language pdf
63 Challenges and Opportunities in Multi-device Speech Processing pdf
64 Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection pdf
65 Decoupled Federated Learning for ASR with Non-IID Data pdf
66 Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech pdf
67 FedNST: Federated Noisy Student Training for Automatic Speech Recognition pdf
68 Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition pdf
69 TEVR: Improving Speech Recognition by Token Entropy Variance Reduction pdf
70 The THUEE System Description for the IARPA OpenASR21 Challenge pdf
71 Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus pdf
72 Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project pdf
73 Knowledge-driven Subword Grammar Modeling for Automatic Speech Recognition in Tamil and Kannada pdf
74 Subword Dictionary Learning and Segmentation Techniques for Automatic Speech Recognition in Tamil and Kannada pdf
75 Implementation Of Tiny Machine Learning Models On Arduino 33 BLE For Gesture And Speech Recognition pdf
76 ASR Error Detection via Audio-Transcript entailment pdf
77 Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition pdf
78 Towards Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription pdf
79 ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale pdf
80 Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation pdf
81 Sotto Voce: Federated Speech Recognition with Differential Privacy Guarantees pdf
82 Position Prediction as an Effective Pretraining Strategy pdf
83 Efficient spike encoding algorithms for neuromorphic speech recognition pdf
84 RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to-Text Adversarial Attacks pdf
85 End-to-end speech recognition modeling from de-identified data pdf
86 Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding pdf
87 Generating gender-ambiguous voices for privacy-preserving speech recognition pdf
88 Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism pdf
89 Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition pdf
90 Swiss German Speech to Text system evaluation pdf
91 Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models pdf
92 Towards Disentangled Speech Representations pdf
93 Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages pdf
94 Low-Level Physiological Implications of End-to-End Learning of Speech Recognition pdf
95 Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech pdf
96 ASR Error Correction with Constrained Decoding on Operation Prediction pdf
97 Adversarial Attacks on ASR Systems: An Overview pdf
98 DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition pdf
99 Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition pdf
100 Blind Signal Dereverberation for Machine Speech Recognition pdf
101 On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering pdf
102 Assessing ASR Model Quality on Disordered Speech using BERTScore pdf
103 ESPnet-ONNX: Bridging a Gap Between Research and Production pdf
104 A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation pdf
105 Modeling Dependent Structure for Utterances in ASR Evaluation pdf
106 VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition pdf
107 Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model pdf

2021

1 Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric pdf
2 Speech recognition for air traffic control via feature learning and end-to-end training pdf
3 A study on native American English speech recognition by Indian listeners with varying word familiarity level pdf
4 Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition Systems pdf

本站内容源自互联网,如有内容侵犯了你的权益,请联系删除相关内容,联系邮箱:yongqiangli@alumni.hust.edu.cn

Copyright © 2015-2035 li yongqiang All Rights Reserved