Director

Journal and conference on speech
CCF-A NeuraIPS   AAAI   IJAI   ACMMM
CCF-B ICASSP   COLING   SpeechCom   TSLP   TASLP   JSLHR   TMM   TOMCCAP   ICME
CCF-C INTERSPEECH   ICPR
other ICLR
On going
2021 speech synthese pdf
2022 speech synthesis pdf
General TTS

2022

1 DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs pdf
2 The MSXF TTS System for ICASSP 2022 ADD Challenge pdf
3 MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription pdf
4 Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance pdf
5 ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech pdf
6 Unsupervised word-level prosody tagging for controllable speech synthesis pdf
7 FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation pdf
8 Building Synthetic Speaker Profiles in Text-to-Speech Systems pdf
9 Revisiting Over-Smoothness in Text to Speech pdf
10 A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS pdf
11 A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis pdf
12 A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing pdf
13 Applying Syntax–Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis pdf
14 BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis pdf
15 Differentiable Duration Modeling for End-to-End Text-to-Speech pdf
16 DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning pdf
17 ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis pdf
18 Improve few-shot voice cloning using multi-modal learning pdf
19 JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech pdf
20 Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech pdf
21 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation pdf
22 Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition pdf
23 Variational Auto-Encoder based Mandarin Speech Cloning pdf
24 Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise pdf
25 vTTS: visual-text to speech pdf
26 WavThruVec: Latent speech representation as intermediate features for neural speech synthesis pdf
27 Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss pdf
28 SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech pdf
29 Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech pdf
30 Unsupervised Quantized Prosody Representation for Controllable Speech Synthesis pdf
31 Simple and Effective Unsupervised Speech Synthesis pdf
32 AILTTS: Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech pdf
33 VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature pdf
34 Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis pdf
35 NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality pdf
36 Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech pdf
37 Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History pdf
38 Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech pdf
39 NatiQ: An End-to-end Text-to-Speech System for Arabic pdf
40 R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS pdf
41 TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder pdf
42 UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder pdf
43 Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models pdf
44 Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation pdf
45 Diffsound: Discrete Diffusion Model for Text-to-sound Generation pdf
46 LIP: Lightweight Intelligent Preprocessor for meaningful text-to-speech pdf
47 ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech pdf
48 DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders pdf
49 Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech pdf
50 SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate pdf
51 BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model pdf
52 Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS) pdf
53 Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS) pdf
54 Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need pdf
55 Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks pdf
56 Visualising Model Training via Vowel Space for Text-To-Speech Systems pdf
57 A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis pdf
58 EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models pdf
59 A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS pdf
60 Controllable Accented Text-to-Speech Synthesis pdf
61 Deep Speech Synthesis from Articulatory Representations pdf
62 AudioGen: Textually Guided Audio Generation pdf

2021

1 Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet pdf
2 VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention pdf
3 LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search pdf
4 Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech pdf
5 ADASPEECH: ADAPTIVE TEXT TO SPEECH FOR CUSTOM VOICE pdf
6 Building Multilingual TTS using Cross-Lingual Voice Conversion pdf
7 Supervised and Unsupervised Approaches for Controlling Narrow Lexical Focus in Sequence-to-Sequence Speech Synthesis pdf
8 Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis pdf
9 Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input pdf
10 Data-Efficient Training Strategies for Neural TTS Systems pdf
11 Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken Language Learners pdf
12 Text-to-speech for the hearing impaired pdf
13 Continual Speaker Adaptation for Text-to-Speech Synthesis pdf
14 Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling pdf
15 PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS pdf
16 SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model pdf
17 Fast DCTTS: Efficient Deep Convolutional Text-to-Speech pdf
18 Diff-TTS: A Denoising Diffusion Model for Text-to-Speech pdf
19 Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling pdf
20 Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features pdf
21 Speech Resynthesis from Discrete Disentangled Self-Supervised Representations pdf
22 Dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech pdf
23 Review of end-to-end speech synthesis technology based on deep learning pdf
24 dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech pdf
26 TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model Stanislav Beliaev, Boris Ginsburgfor Speech Synthesis with Explicit Pitch and Duration Prediction pdf
27 Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks pdf
28 SpeechNet: A Universal Modularized Model for Speech Processing Tasks pdf   blog
29 How do Voices from Past Speech Synthesis Challenges Compare Today? pdf   blog
30 Learning Robust Latent Representations for Controllable Speech Synthesis pdf
31 MASS: Multi-task Anthropomorphic Speech Synthesis Framework pdf
32 VQCPC-GAN: Variable-length Adversarial Audio Synthesis using Vector-Quantized Contrastive Predictive Coding pdf
33 SpeechNet: A Universal Modularized Model for Speech Processing Tasks pdf
34 Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech pdf
35 Ito^TTS and Ito^Wave: Linear Stochastic Differential Equation Is All You Need For Audio Generation pdf
36 Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling pdf
37 A learned conditional prior for the VAE acoustic space of a TTS system pdf
38 A Survey on Neural Speech Synthesis pdf
39 An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis pdf
40 Byakto Speech: Real-time long speech synthesis with convolutional neural network: Transfer learning from English to Bangla pdf
41 Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech pdf
42 Controllable Context-aware Conversational Speech Synthesis pdf
43 Multi-Scale Spectrogram Modelling for Neural Text-to-Speech pdf
44 Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis pdf
45 FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis pdf
46 GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis pdf
47 Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech pdf
48 Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows pdf
49 Non-native English lexicon creation for bilingual speech synthesis pdf
50 Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech pdf
51 Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis pdf
52 Speech BERT Embedding For Improving Prosody in Neural TTS pdf
53 WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis pdf
54 Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance pdf
55 VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis pdf
56 Location, Location: Enhancing the Evaluation of Text-to-Speech Synthesis Using the Rapid Prosody Transcription Paradigm pdf
57 Federated Learning with Dynamic Transformer for Text to Speech pdf
58 Effective and Differentiated Use of Control Information for Multi-speaker Speech Synthesis pdf
59 End to End Bangla Speech Synthesis pdf
60 Perceptually Guided End-to-End Text-to-Speech With MOS Prediction pdf
61 One TTS Alignment To Rule Them All pdf
62 Combining speakers of multiple languages to improve quality of neural voices pdf
63 DeepEigen: Learning-based Modal Sound Synthesis with Acoustic Transfer Maps pdf
64 Neural HMMs are all you need (for high-quality attention-free TTS) pdf
65 PortaSpeech: Portable and High-Quality Generative Text-to-Speech pdf
66 Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS pdf
67 Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network pdf
68 An Audio Synthesis Framework Derived from Industrial Process Control pdf
69 On-device neural speech synthesis pdf
70 fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit pdf
71 A study on the efficacy of model pre-training in developing neural text-to-speech system pdf
72 DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021 pdf
73 Discrete acoustic space for an efficient sampling in neural text-to-speech pdf
74 EdiTTS: Score-based Editing for Controllable Text-to-Speech pdf
75 Emphasis control for parallel neural TTS pdf
76 Environment Aware Text-to-Speech Synthesis pdf
77 ESPnet2-TTS: Extending the Edge of TTS Research pdf
78 FedSpeech: Federated Text-to-Speech with Continual Learning pdf
79 Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS pdf
80 Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings pdf
81 Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge pdf
82 On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis pdf
83 PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control pdf
84 Prosody-TTS: An end-to-end speech synthesis system with prosody control pdf
85 A study on the efficacy of model pre-training in developing neural text-to-speech system pdf
86 Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video pdf
87 Guided-TTS:Text-to-Speech with Untranscribed Speech pdf
88 Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control pdf
89 Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data pdf
90 More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech pdf
91 Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis pdf
92 RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses pdf
93 Speaker Generation pdf

2020

1 INTERACTIVE TEXT-TO-SPEECH VIA SEMI-SUPERVISED STYLE TRANSFER LEARNING pdf
2 SQUEEZEWAVE EXTREMELY LIGHTWEIGHT VOCODERS FOR ON DEVICE SPEECH SYNTHESIS pdf   demo   code
3 LOCATION RELATIVE ATTENTION MECHANISMS FOR ROBUST LONG FORM SPEECH SYNTHESIS pdf
4 End to End Adversarial Text to Speech pdf   demo
5 FastSpeech 2 Fast and High Quality End to End Text to Speech pdf
6 Deep Representation Learning in Speech Processing Challenges Recent Advances and Future Trends pdf
7 Flowtron an Autoregressive Flowbased Generative Network for TexttoSpeech Synthesis.pdf pdf
8 JDI-T- Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment pdf
9 FastPitch- Parallel Text-to-speech with Pitch Prediction.pdf pdf
10 Glow-TTS- A Generative Flow for Text-to-Speech via Monotonic Alignment Search.pdf pdf
11 FLOW-TTS: A NON-AUTOREGRESSIVE NETWORK FOR TEXT TO SPEECH BASED ON FLOW pdf
12 SpeedySpeech- Efficient Neural Speech Synthesis pdf
13 End-to-End Adversarial Text-to-Speech pdf
14 Controllable Neural Prosody Synthesis pdf
15 Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling pdf
16 Exploring TTS without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020) pdf
17 From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint pdf
18 Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning pdf
19 Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit pdf
20 Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages pdf
21 Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning pdf
22 Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning pdf
23 NON-ATTENTIVE TACOTRON- ROBUST AND CONTROLLABLE NEURAL TTS SYNTHESIS INCLUDING UNSUPERVISED DURATION MODELING pdf
24 PARALLEL TACOTRON- NON-AUTOREGRESSIVE AND CONTROLLABLE TTS pdf
25 TTS-BY-TTS- TTS-DRIVEN DATA AUGMENTATION FOR FAST AND HIGH-QUALITY SPEECH SYNTHESIS pdf
26 SPEECH SYNTHESIS AND CONTROL USING DIFFERENTIABLE DSP pdf
27 FEATHERTTS- ROBUST AND EFFICIENT ATTENTION BASED NEURAL TTS pdf
28 GRAPHSPEECH: SYNTAX-AWARE GRAPH ATTENTION NETWORK FOR NEURAL SPEECH SYNTHESIS pdf
29 HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS pdf
30 DEVICETTS: A SMALL-FOOTPRINT, FAST, STABLE NETWORK FOR ON-DEVICE TEXT-TO-SPEECH pdf
31 PRETRAINING STRATEGIES, WAVEFORM MODEL CHOICE, AND ACOUSTIC CONFIGURATIONS FOR MULTI-SPEAKER END-TO-END SPEECH SYNTHESIS pdf
32 Fast and lightweight on-device TTS with Tacotron2 and LPCNet pdf

2019

2019 isca 2019 speech papers
1 Deep Text-to-Speech System with Seq2Seq Model pdf
2 FastSpeech: Fast, Robust and Controllable Text to Speech pdf
3 Neural Speech Synthesis with Transformer Network pdf   ppt   demo
4 Parallel Neural Text-to-Speech pdf
5 Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS pdf
6 LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech pdf
7 Forward-Backward Decoding for Regularizing End-to-End TTS pdf
8 Self-attention Based Prosodic Boundary Prediction for Chinese Speech Synthesis pdf
9 Guide to Speech Synthesis with Deep Learning ppt
10 tts tutorial part1 part2 ppt1   ppt2
11 maximizing mutual information for tacotron pdf
12 durlan pdf
13 Non-Autoregressive Neural Text-to-Speech pdf
14 Tacotron-based acoustic model using phoneme alignment for practical neural text-to-speech systems pdf

2018

2018 isca 2018 speech papers
1 Deep voice 3: Scaling text-to-speech with convolutional sequence learning pdf
2 ClariNet Parallel Wave Generation in End-to-End Text-to-Speech pdf
3 Linear Networks Based Speaker Adaptation For Speech Synthesis pdf

2017

2017 isca 2017 speech papers
1 Tacotron: Towards End-to-End Speech Synthesis pdf   page
2 Char2Wav: End-to-End Speech Synthesis pdf
3 Deep Voice: Real-time Neural Text-to-Speech pdf
4 Deep Voice 2: Multi-Speaker Neural Text-to-Speech pdf
5 VoiceLoop voice fitting and synthesis via a phonological loop pdf
6 Attention Is All You Need pdf

2016

2016 isca 2016 speech papers
1 Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices pdf
2 Merlin: An Open Source Neural Network Speech Synthesis System pdf

2015

1 Acoustic modeling instatistical parametric speechsynthesis-from HMM to LSTM-RNN pdf   ppt
2 Effective Approaches to Attention-based Neural Machine Translation pdf
3 htkbook-3.5 pdf
5 A study of speaker adaptation for DNN-based speech synthesis pdf

2014

1 TTS Synthesis with Bidirectional LSTM based Recurrent Neural Networks pdf

2013

1 Statical parameteric speech synthesis Using deep neural networks pdf
Vocoder

2022

1 ItôWave: Itô Stochastic Differential Equation Is All You Need For Wave Generation pdf
2 End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation pdf
3 Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet pdf
4 Phase Vocoder Done Right pdf
5 It's Raw! Audio Generation with State-Space Models pdf
6 InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Trainin pdf
7 A Neural Vocoder Based Packet Loss Concealment Algorithm pdf
8 AdaVocoder: Adaptive Vocoder for Custom Voice pdf
9 Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge pdf
10 HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement pdf
11 iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform pdf
12 Neural Vocoder is All You Need for Speech Super-resolution pdf
13 SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping pdf
14 Parallel Synthesis for Autoregressive Speech Generation pdf
15 Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation pdf
16 FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis pdf
17 Streamable Neural Audio Synthesis With Non-Causal Convolutions pdf
18 A Post Auto-regressive GAN Vocoder Focused on Spectrum Fracture pdf
19 BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis pdf
20 Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation pdf
21 cMelGAN: An Efficient Conditional Generative Model Based on Mel Spectrograms pdf
22 Avocodo: Generative Adversarial Network for Artifact-free Vocoder pdf
23 BigVGAN: A Universal Neural Vocoder with Large-Scale Training pdf
24 GoodBye WaveNet -- A Language Model for Raw Audio with Context of 1/2 Million Samples pdf
25 WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis pdf
26 End-to-End Binaural Speech Synthesis pdf
27 Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer pdf
28 Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0 pdf
29 DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation pdf
30 Mel Spectrogram Inversion with Stable Pitch pdf
31 An Initial study on Birdsong Re-synthesis Using Neural Vocoders pdf

2021

1 GAN Vocoder: Multi-Resolution Discriminator Is All You Need pdf
2 Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss pdf
3 Universal Neural Vocoding with Parallel WaveNet pdf
4 LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation pdf
5 High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion pdf
6 Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains pdf
7 Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN pdf
8 Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN pdf
9 Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders pdf
10 High-Fidelity and Low-Latency Universal Neural Vocoder based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling pdf
11 a generative model for raw audio using transformer architectures pdf
12 WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution pdf
13 Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters pdf
14 Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition pdf
15 Catch-A-Waveform: Learning to Generate Audio from a Single Short Example pdf
16 Continuous Wavelet Vocoder-based Decomposition of Parametric Speech Waveform Synthesis pdf
17 CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis pdf
18 Fre-GAN: Adversarial Frequency-consistent Audio Synthesis pdf
19 Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis pdf
20 Improving the expressiveness of neural vocoding with non-affine Normalizing Flows pdf
21 Mathematical Vocoder Algorithm : Modified Spectral Inversion for Efficient Neural Speech Synthesis pdf
22 Relational Data Selection for Data Augmentation of Speaker-dependent Multi-band MelGAN Vocoder pdf
23 UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation pdf
24 WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution pdf
25 A GENERATIVE MODEL FOR RAW AUDIO USING TRANSFORMER ARCHITECTURES pdf
26 Neural Waveshaping Synthesis pdf
27 A Generative Model for Raw Audio Using Transformer Architectures pdf
28 DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs pdf
29 Ito^TTS and Ito^Wave: Linear Stochastic Differential Equation Is All You Need For Audio Generation pdf
31 A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate pdf
32 FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis pdf
33 MSR-NV: Neural vocoder using multiple sampling rates pdf
34 Chunked Autoregressive GAN for Conditional Waveform Synthesis pdf
35 Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations pdf
36 Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet pdf
37 Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks pdf
38 The Mirrornet : Learning Audio Synthesizer Controls Inspired by Sensorimotor Interaction pdf
39 Towards Universal Neural Vocoding with a Multi-band Excited WaveNet pdf
40 High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latenc pdf
41 RAVE: A variational autoencoder for fast and high-quality neural audio synthesis pdf
42 VocBench: A Neural Vocoder Benchmark for Speech Synthesis pdf
43 DiffWave: A Versatile Diffusion Model for Audio Synthesis pdf

2020

1 multiband melgan pdf
2 FeatherWave An efficient high fidelity neural vocoder with multiband linear prediction pdf
3 parallel wavegan pdf
4 VocGan pdf
5 WAVEGRAD- ESTIMATING GRADIENTS FOR WAVEFORM GENERATION pdf
6 PARALLEL WAVEGAN- A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM pdf
7 WAVEGRAD ESTIMATING GRADIENTS FOR WAVEFORM GENERATION pdf
8 A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems pdf
9 Bunched LPCNet - Vocoder for Low-cost Neural Text-To-Speech Systems pdf
10 Quasi-Periodic Parallel WaveGAN Vocoder- A Non-autoregressive Pitchdependent Dilated Convolution Model for Parametric Speech Generation pdf
11 Neural Text-to-Speech with a Modeling-by-Generation Excitation Vocoder pdf
12 Improving Opus Low Bit Rate Quality with Neural Speech Synthesis pdf
13 WG-WaveNet- Real-Time High-Fidelity Speech Synthesis without GPU pdf
14 Vocoder-Based Speech Synthesis from Silent Videos pdf
15 Ultrasound-based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis pdf
16 Speaker Conditional WaveRNN- Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions pdf
17 GAUSSIAN LPCNET FOR MULTISAMPLE SPEECH SYNTHESIS pdf
18 UNIVERSAL MELGAN: A ROBUST NEURAL VOCODER FOR HIGH-FIDELITY WAVEFORM GENERATION IN MULTIPLE DOMAINS pdf
19 Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network pdf
20 What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS pdf
21 Lightweight LPCNet-based Neural Vocoder with Tensor Decomposition pdf
22 HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis pdf

2019

1 High quality lightweight and adaptable TTS using LPCNet pdf
2 A Neural Vocoder with Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis pdf
3 RawNet: Fast End-to-End Neural Vocoder pdf
4 A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet pdf
5 Lpcnet improving neural speech synthesis through linear prediction pdf   demo   code
6 Waveglow pdf
7 melgan pdf
8 AN INVESTIGATION OF SUBBAND WAVENET VOCODER COVERING ENTIRE AUDIBLE FREQUENCY RANGE WITH LIMITED ACOUSTIC FEATURES pdf
9 A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction pdf

2018

1 Natural TTS Synthesis by Conditioning Wavennet on MEL spectrogram predictions(tacotron2) pdf   code
2 Efficient Neural Audio Synthesis (WaveRNN) pdf
3 Improving FFTNet vocoder with noise shaping and subband approaches pdf
4 FFTNET: A REAL-TIME SPEAKER-DEPENDENT NEURAL VOCODER pdf
4 SQUEEZEWAVE: EXTREMELY LIGHTWEIGHT VOCODERS FOR ON-DEVICE SPEECH SYNTHESIS pdf   code

2017

1 Parallel WaveNet: Fast High-Fidelity Speech Synthesis pdf

2016

1 Wavenet A Generative Model For Raw Audio pdf   demo   code
2 Fast Wavenet Geneartion Algorithm pdf
Adap & Multispeaker & Multilingual

2022

1 Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training pdf
2 Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention pdf
3 nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech pdf
4 Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module pdf
5 Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features pdf
6 Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis pdf
7 Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus pdf
8 VoiceMe: Personalized voice generation in TTS pdf
9 Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech pdf
10 Data-augmented cross-lingual synthesis in a teacher-student framework pdf
11 Fine-grained Noise Control for Multispeaker Speech Synthesis pdf
12 Self supervised learning for robust voice cloning pdf
13 Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis pdf
14 AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios pdf
15 Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding pdf
16 Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations pdf
17 SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech pdf
18 CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer pdf
19 Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech pdf
20 AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation pdf
21 Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data pdf
22 TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS pdf
23 Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS pdf
24 When Is TTS Augmentation Through a Pivot Language Useful? pdf
25 A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System pdf
26 Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis pdf
27 ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS pdf
28 Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech pdf

2021

1 Building Multilingual TTS using Cross-Lingual Voice Conversion pdf
2 ADASPEECH: ADAPTIVE TEXT TO SPEECH FOR CUSTOM VOICE pdf
3 Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech pdf
4 Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning pdf
5 CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge pdf
6 Real-time Timbre Transfer and Sound Synthesis using DDSP pdf
7 The Multi-speaker Multi-style Voice Cloning Challenge 2021 pdf
8 The AS-NU System for the M2VoC Challenge pdf
9 Exploring Disentanglement with Multilingual and Monolingual VQ-VAE pdf
10 Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation pdf
11 Speaker Adaptation with Continuous Vocoder-based DNN-TTS pdf
12 GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints pdf
13 Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration pdf
14 Adapting TTS models For New Speakers using Transfer Learning pdf
15 Cloning one's voice using very limited data in the wild pdf
16 Adapting TTS models For New Speakers using Transfer Learning pdf
17 Applying Phonological Features in Multilingual Text-To-Speech pdf
18 Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech pdf
19 Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data pdf
20 Revisiting IPA-based Cross-lingual Text-to-speech pdf
21 Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis pdf
22 Cross-lingual Low Resource Speaker Adaptation Using Phonological Features pdf
23 Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech pdf
24 V2C: Visual Voice Cloning pdf

2020

1 Cross lingual Multispeaker Text to Speech under Limited Data Scenario pdf
2 Efficient neural speech synthesis for low resource languages through multilingual modeling pdf
3 EndtoEnd Code Switching TTS with Cross Lingual Language Model pdf
4 Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data pdf
5 One Model Many Languages Meta learning for Multilingual Text to Speech pdf
6 SPEAKER ADAPTATION OF A MULTILINGUAL ACOUSTIC MODEL FOR CROSS-LANGUAGE SYNTHESIS pdf
7 Multilingual speech synthesis pdf
8 Domain-adversarial training of multi-speaker TTS pdf
9 Focusing on Attention Prosody Transfer and Adaptative Optimization Strategy for Multi Speaker End to End Speech Synthesis pdf
10 Zero Shot Multi Speaker Text To Speech with State of the art Neural Speaker Embeddings pdf
11 Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS pdf
12 Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes pdf
13 Phonological Features for 0-shot Multilingual Speech Synthesis pdf
14 Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation pdf
15 Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion pdf
16 USING IPA-BASED TACOTRON FOR DATA EFFICIENT CROSS-LINGUAL SPEAKER ADAPTATION AND PRONUNCIATION ENHANCEMENT pdf

2019

1 Cross lingual Multi speaker Texttospeech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers pdf
2 Learning to Speak Fluently in a Foreign Language Multilingual Speech Synthesis and Cross Language Voice Cloning pdf
3 个性化语音合成中说话人特征不同嵌入方式的研究 pdf
4 Cross lingual Multispeaker Text To Speech Synthesis Using Neural Speaker Embedding pdf
5 Automatic Multispeaker Voice Cloning pdf   code
6 Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis pdf   demo
7 Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora pdf

2017

1 Speaker adaptation in DNN-based speech synthesis using d-vectors pdf

2016

6 Speaker Representations for Speaker Adaptation in Multiple Speakers’BLSTM-RNN-based Speech Synthesis pdf

2015

6 Multi-speaker modeling and speaker adaptation for dnn-based tts synthesis pdf
Expressive TTS

2022

1 Disentangling Style and Speaker Attributes for TTS Style Transfer pdf
2 MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis pdf
3 Distribution augmentation for low-resource expressive text-to-speech pdf
4 Cross-speaker style transfer for text-to-speech using data augmentation pdf
5 Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis pdf
6 Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation pdf
7 Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis pdf
8 StyleWaveGAN: Style-based synthesis of drum sounds with extensive controls using generative adversarial networks pdf
9 StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis pdf
10 GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis pdf
11 End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue pdf
12 Expressive, Variable, and Controllable Duration Modelling in TTS pdf
13 iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre pdf
14 Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems pdf
15 Self-supervised Context-aware Style Representation for Expressive Speech Synthesis pdf
16 Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody pdf
17 Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis pdf
18 Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS pdf
19 PoeticTTS -- Controllable Poetry Reading for Literary Studies pdf
20 Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis pdf
21 Speech Synthesis with Mixed Emotions pdf
22 Towards Cross-speaker Reading Style Transfer on Audiobook Dataset pdf
23 The Role of Voice Persona in Expressive Communication:An Argument for Relevance in Speech Synthesis Design pdf

2021

1 Whispered and Lombard Neural Speech Synthesis pdf
2 Expressive Neural Voice Cloning pdf
3 Model architectures to extrapolate emotional expressions in DNN-based text-to-speech pdf
4 Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system pdf
5 STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speech pdf
6 Expressive Text-to-Speech using Style Tag pdf
7 Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability pdf
8 Towards Multi-Scale Style Control for Expressive Speech Synthesis pdf
9 AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data pdf
10 Exploring emotional prototypes in a high dimensional TTS latent space pdf
11 Global Rhythm Style Transfer Without Text Transcriptions pdf
12 Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS pdf
13 Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech pdf
14 Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis pdf
15 UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control pdf
16 Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis pdf
17 AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style pdf
18 Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis pdf
19 Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis pdf
20 Enhancing audio quality for expressive Neural Text-to-Speech pdf
21 Emotional Speech Synthesis for Companion Robot to Imitate Professional Caregiver Speech pdf
22 Controllable cross-speaker emotion transfer for end-to-end speech synthesis pdf
23 Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech pdf
24 GANtron: Emotional Speech Synthesis with Generative Adversarial Networks pdf
25 Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation pdf
26 StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis pdf
27 Fine-grained style control in Transformer-based Text-to-speech Synthesis pdf
28 Using multiple reference audios and style embedding constraints for speech synthesis pdf
29 Emotional Prosody Control for Speech Generation pdf
30 Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning pdf
31 Word-Level Style Control for Expressive, Non-attentive Speech Synthesis pdf
32 Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios pdf
33 Multi-speaker Emotional Text-to-speech Synthesizer pdf

2020

1 Controllable Neural Prosody Synthesis pdf
2 FULLY-HIERARCHICAL FINE-GRAINED PROSODY MODELING FOR INTERPRETABLE SPEECH SYNTHESIS pdf
3 Flowtron- an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis pdf
4 Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion pdf
5 Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis pdf
6 Controllable Emotion Transfer For End-to-End Speech Synthesis pdf
7 Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis pdf

2019

1 MULTI-REFERENCE NEURAL TTS STYLIZATION WITH ADVERSARIAL CYCLE CONSISTENCY pdf
2 Multi-reference Tacotron by Intercross Training for Style Disentangling, Transfer and Control in Speech Synthesis pdf
3 MELLOTRON- MULTISPEAKER EXPRESSIVE VOICE SYNTHESIS BY CONDITIONING ON RHYTHM, PITCH AND GLOBAL STYLE TOKENS pdf

2018

1 HIERARCHICAL GENERATIVE MODELING FOR CONTROLLABLE SPEECH SYNTHESIS.pdf pdf
2 Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron pdf
3 Style Tokens- Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis pdf
4 PREDICTING EXPRESSIVE SPEAKING STYLE FROM TEXT IN END-TO-END SPEECH SYNTHESIS.pdf pdf
Voice Conversion

2022

1 Invertible Voice Conversion pdf
2 Emotion Intensity and its Control for Emotional Voice Conversion pdf
3 IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion pdf
4 Noise-robust voice conversion with domain adversarial training pdf
5 DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning pdf
6 AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learning pdf
7 An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion pdf
8 Analysis of Voice Conversion and Code-Switching Synthesis Using VQ-VAE pdf
9 DGC-vector: A new speaker embedding for zero-shot voice conversion pdf
10 Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion pdf
11 DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning pdf
12 Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution pdf
13 Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE pdf
14 HiFi-VC: High Quality ASR-Based Voice Conversion pdf
15 Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion pdf
16 SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlen pdf
17 Text-free non-parallel many-to-many voice conversion using normalising flows pdf
18 Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion pdf
19 Time Domain Adversarial Voice Conversion for ADD 2022 pdf
20 Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion pdf
21 Towards Improved Zero-shot Voice Conversion with Conditional DSVAE pdf
22 End-to-End Zero-Shot Voice Style Transfer with Location-Variable Convolutions pdf
23 An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions pdf
24 End-to-End Voice Conversion with Information Perturbation pdf
25 Identifying Source Speakers for Voice Conversion based Spoofing Attacks on Speaker Verification Systems pdf
26 Speak Like a Dog: Human to Non-human creature Voice Conversion pdf
27 Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion pdf
28 Streaming non-autoregressive model for any-to-many voice conversion pdf
29 Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion pdf
30 A Comparative Study of Self-supervised Speech Representation Based Voice Conversion pdf
31 Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion pdf
32 GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion pdf
33 Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers pdf
34 Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion pdf
35 TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training pdf
36 ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Rhythm pdf
37 Boosting Star-GANs for Voice Conversion with Contrastive Discriminator pdf
38 DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion pdf
39 Investigation into Target Speaking Rate Adaptation for Voice Conversion pdf

2021

1 EMOCAT: LANGUAGE-AGNOSTIC EMOTIONAL VOICE CONVERSION pdf
2 Building Multilingual TTS using Cross-Lingual Voice Conversion pdf
3 High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion pdf
4 Hierarchical disentangled representation learning for singing voice conversio pdf
5 Adversarially learning disentangled speech representations for robust multi-factor voice conversion pdf
6 Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram pdf
7 Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion pdf
8 crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder pdf
9 MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames pdf
10 Axial Residual Networks for CycleGAN-based Voice Conversion pdf
11 Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning pdf
12 CycleDRUMS: Automatic Drum Arrangement For Bass Lines Using CycleGAN pdf
13 Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques pdf
14 S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations pdf
15 StarGAN-based Emotional Voice Conversion for Japanese Phrases pdf
16 NoiseVC: Towards High Quality Zero-Shot Voice Conversion pdf
17 Non-autoregressive sequence-to-sequence voice conversion pdf
18 FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion pdf
19 NoiseVC: Towards High Quality Zero-Shot Voice Conversion pdf
20 Non-autoregressive sequence-to-sequence voice conversion pdf
21 Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss pdf
22 Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels pdf
23 An Adaptive Learning based Generative Adversarial Network for One-To-One Voice Conversion pdf
24 Low-Latency Real-Time Non-Parallel Voice Conversion based on Cyclic Variational Autoencoder and Multiband WaveRNN with Data-Driven Linear Prediction pdf
25 Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery pdf
26 DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion pdf
27 Emotional Voice Conversion: Theory, Databases and ESD pdf
28 Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance pdf
29 A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion pdf
30 Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion pdf
31 Improving robustness of one-shot voice conversion with deep discriminative speaker encoder pdf
32 NVC-Net: End-to-End Adversarial Voice Conversion pdf
33 Pathological voice adaptation with autoencoder-based voice conversion pdf
34 Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments pdf
35 VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion pdf
36 StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion pdf
37 On Prosody Modeling for ASR+TTS based Voice Conversion pdf
38 An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation pdf
39 Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder pdf
40 Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer pdf
41 Improving robustness of one-shot voice conversion with deep discriminative speaker encoder pdf
42 StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition pdf
43 Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning pdf
44 Noisy-to-Noisy Voice Conversion Framework with Denoising Model pdf
45 Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion pdf
46 Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme pdf
47 Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks pdf
48 MediumVC: Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features pdf
49 S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations pdf
50 Sequence-To-Sequence Voice Conversion using F0 and Time Conditioning and Adversarial Learning pdf
51 Speech Enhancement-assisted Stargan Voice Conversion in Noisy Environments pdf
52 Toward Degradation-Robust Voice Conversion pdf
53 Towards Identity Preserving Normal to Dysarthric Voice Conversion pdf
54 A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion pdf
55 AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion pdf
56 Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion pdf
57 CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer pdf
58 Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion pdf
59 One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation pdf
60 SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines pdf
61 Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features pdf
62 Conditional Deep Hierarchical Variational Autoencoder for Voice Conversion pdf
63 YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone pdf

2020

1 Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data pdf
2 An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning pdf
3 Converting Anyone’s Emotion- Towards Speaker-Independent Emotional Voice Conversion pdf
4 SEEN AND UNSEEN EMOTIONAL STYLE TRANSFER FOR VOICE CONVERSION WITH A NEW EMOTIONAL SPEECH DATASET pdf
5 ANY-TO-ONE SEQUENCE-TO-SEQUENCE VOICE CONVERSION USING SELF-SUPERVISED DISCRETE SPEECH REPRESENTATIONS pdf
6 GAZEV- GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus pdf
7 TOWARDS LOW-RESOURCE STARGAN VOICE CONVERSION USING WEIGHT ADAPTIVE INSTANCE NORMALIZATION pdf
8 CycleGAN-VC3- Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion pdf
9 Accent and Speaker Disentanglement in Many-to-many Voice Conversion pdf

2019

1 AUTOVC- Zero-Shot Voice Style Transfer with Only Autoencoder Loss pdf
2 An Overview of Voice Conversion Systems pdf
3 Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion pdf
4 Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations pdf

2017

1 An Overview of Voice Conversion Systems pdf
Sing Synthesis

2022

1 Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals pdf
2 partitura: A Python Package for Handling Symbolic Musical Data pdf
3 FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control pdf
4 Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis pdf
5 MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder pdf
6 Quantized GAN for Complex Music Generation from Dance Videos pdf
7 Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch Encoder pdf
8 Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher pdf
9 Music Generation Using an LSTM pdf
10 SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy pdf
11 U-Singer: Multi-Singer Singing Voice Synthesizer that Controls Emotional Intensity pdf
12 WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses pdf
13 Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis pdf
14 Deep Performer: Score-to-Audio Music Performance Synthesis pdf
15 Learning the Beauty in Songs: Neural Singing Voice Beautifier pdf
16 SUSing: SU-net for Singing Voice Synthesis pdf
17 Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis pdf
<18/a> A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion pdf
19 Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis pdf
20 Multi-instrument Music Synthesis with Spectrogram Diffusion pdf
21 HouseX: A Fine-grained House Music Dataset and its Potential in the Music Industry pdf
22 WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training pdf
23 What is missing in deep music generation? A study of repetition and structure in popular music pdf
24 A New Corpus for Computational Music Research and A Novel Method for Musical Structure Analysis pdf
25 MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks pdf
26 Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer pdf
27 Musika! Fast Infinite Waveform Music Generation pdf
28 Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN pdf
29 musicaiz: A Python Library for Symbolic Music Generation, Analysis and Visualization pdf
30 Domain Adversarial Training on Conditional Variational Auto-Encoder for Controllable Music Generation pdf
31 SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias pdf
32 What is missing in deep music generation? A study of repetition and structure in popular music pdf

2021

1 Anyone GAN Sing pdf
2 Latent Space Explorations of Singing Voice Synthesis using DDSP pdf
3 Learning to Generate Music With Sentiment pdf
4 Hierarchical disentangled representation learning for singing voice conversio pdf
5 Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis pdf
6 DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis pdf
7 LoopNet: Musical Loop Synthesis Conditioned On Intuitive Musical Parameters pdf
8 Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis pdf
9 Music Generation using Deep Learning pdf
10 MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis pdf
11 N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement pdf
12 Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System pdf
13 A Unified Model for Zero-shot Music Source Separation, Transcription and Synthesis pdf
14 An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures pdf
15 A Melody-Unsupervision Model for Singing Voice Synthesis pdf
16 A Survey on Recent Deep Learning-driven Singing Voice Synthesis Systems pdf
17 DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding pdf
18 Enhanced Memory Network: The novel network structure for Symbolic Music Generation pdf
19 KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using Mel-spectrograms pdf
20 KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke pdf
21 Pitch Preservation In Singing Voice Synthesis pdf
22 SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation pdf
23 Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding pdf
24 A Melody-Unsupervision Model for Singing Voice Synthesis pdf
25 A Survey on Recent Deep Learning-driven Singing Voice Synthesis Systems pdf
26 A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody pdf
27 Learning To Generate Piano Music With Sustain Pedals pdf
28 Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control pdf
29 Symbolic Music Loop Generation with VQ-VAE pdf
30 Video Background Music Generation with Controllable Music Transformer pdf
31 Zero-shot Singing Technique Conversion pdf
32 Evaluating Deep Music Generation Methods Using Data Augmentation pdf
33 Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus pdf
34 EmotionBox: a music-element-driven emotional music generation system using Recurrent Neural Network pdf

2020

1 HIFISINGER TOWARDS HIGH FIDELITY NEURAL SINGING VOICE SYNTHESIS pdf
2 ByteSing A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder Decoder Acoustic Models and WaveRNN Vocoders pdf
3 DurIAN SC Duration Informed Attention Network based Singing Voice Conversion System pdf
4 Jukebox A Generative Model for Music pdf
5 XiaoiceSing- A High-Quality and Integrated Singing Voice Synthesis System pdf
6 Speech-to-Singing Conversion based on Boundary Equilibrium GAN pdf
7 A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions pdf

2019

1 MELLOTRON- MULTISPEAKER EXPRESSIVE VOICE SYNTHESIS BY CONDITIONING ON RHYTHM, PITCH AND GLOBAL STYLE TOKENS pdf
Talking Head

2022

1 Multi-modal data fusion of Voice and EMG data for Robotic Control pdf
2 Stitch it in Time: GAN-Based Facial Editing of Real Videos pdf
3 Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels pdf
4 DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering pdf
5 Improving Cross-lingual Speech Synthesis with Triplet Training Scheme pdf
6 VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion pdf
7 CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations pdf
8 Recent Advances and Challenges in Deep Audio-Visual Correlation Learning pdf
9 Freeform Body Motion Generation from Speech pdf
10 Transformer-based Multimodal Information Fusion for Facial Expression Analysis pdf
11 Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion pdf
12 VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices pdf
13 Lip to Speech Synthesis with Visual Context Attentional GAN pdf
14 Residual-guided Personalized Speech Synthesis based on Face Image pdf
15 Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video pdf
16 Text/Speech-Driven Full-Body Animation pdf
17 Talking Face Generation with Multilingual TTS pdf
18 A Novel Speech-Driven Lip-Sync Model with CNN and LSTM pdf
19 FlexLip: A Controllable Text-to-Lip System pdf
20 Learning Speaker-specific Lip-to-Speech Generation pdf
21 VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection pdf
22 Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models pdf
<23/a> Audio Input Generates Continuous Frames to Synthesize Facial Video Using Generative Adiversarial Networks pdf
24 FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis pdf
25 StableFace: Analyzing and Improving Motion Stability for Talking Face Generation pdf
26 Facial Landmark Predictions with Applications to Metaverse pdf
27 AutoLV: Automatic Lecture Video Generator pdf
28 Continuously Controllable Facial Expression Editing in Talking Face Videos pdf
29 TIMIT-TTS: a Text-to-Speech Dataset for Multimodal Synthetic Media Detection pdf
30 Talking Head from Speech Audio using a Pre-trained Image Generator pdf
31 Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild pdf

2021

1 Generating coherent spontaneous speech and gesture from text pdf
2 Creating Song From Lip and Tongue Videos With a Convolutional Vocoder pdf
3 SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer pdf
4 What is Multimodality? pdf
5 MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement pdf
6 Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices pdf
7 Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary pdf
8 Recent Advances and Trends in Multimodal Deep Learning: A Review pdf
9 Rethinking the constraints of multimodal fusion: case study in Weakly-Supervised Audio-Visual Video Parsing pdf
10 Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking pdf
11 LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization pdf
12 NWT: Towards natural audio-to-video generation with representation learning pdf
13 Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text pdf
14 Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion pdf
15 A Survey on Audio Synthesis and Audio-Visual Multimodal Processing pdf
16 Integrated Speech and Gesture Synthesis pdf
17 AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person pdf
18 Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates pdf
19 Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation pdf
20 Audio-to-Image Cross-Modal Generation pdf
21 Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor pdf
22 Talking Head Generation with Audio and Speech Related Facial Action Units pdf
23 LiMuSE: Lightweight Multi-modal Speaker Extraction pdf
24 Metric-based multimodal meta-learning for human movement identification via footstep recognition pdf
25 FaceFormer: Speech-Driven 3D Facial Animation with Transformers pdf
26 Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation pdf
27 PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound pdf

2020

1 What comprises a good talking head video generation? A Survey and Benchmark pdf   code
2 A Novel Face-tracking Mouth Controller and its Application to Interacting with Bioacoustic Models pdf
3 Large-scale multilingual audio visual dubbing pdf

2019

1 (talking head) Text-based Editing of Talking-head Video pdf   vedio
2 Talking Face Generation by Adversarially Disentangled Audio-Visual Representation pdf   code   demo
Robust TTS

2022

1 pdf

2020

1 Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS pdf
2 Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement pdf
3 Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training pdf

2019

1 Neural Text to Speech Adaptation from Low Quality Public Recordings pdf

2018

1 Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization pdf
Front End

2022

1 Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models pdf
2 Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end pdf
3 An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer pdf
4 g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin pdf
5 Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization pdf
6 A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation pdf
7 Automatic Prosody Annotation with Pre-Trained Text-Speech Model pdf
8 SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation pdf
<9/a> A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese pdf
10 Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0 pdf
11 Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech pdf

2021

1 Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning pdf
2 Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects pdf
3 Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems pdf
4 Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis pdf
5 A Unified Transformer-based Framework for Duplex Text Normalization pdf

2020

1 A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS.pdf pdf
2 A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN.pdf pdf
3 A Mask-based Model for Mandarin Chinese Polyphone Disambiguation pdf
4 Unified Mandarin TTS Front-end Based on Distilled BERT Model pdf

2019

1 A Mandarin Prosodic Boundary Prediction Model Based on Multi Task Learning pdf
2 Token Level Ensemble Distillation for Grapheme to Phoneme Conversion pdf
3 Pre trained Text Representations for Improving Front End Text Processing in Mandarin Text to Speech Synthesis pdf

2018

1 Mandarin Prosody Prediction Based on Attention Mechanism and Multi-model Ensemble pdf

2016

1 Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach pdf

2015

1 AUTOMATIC PROSODY PREDICTION FOR CHINESE SPEECH SYNTHESIS USING BLSTM-RNN AND EMBEDDING FEATURES pdf
Alignment

2022

1 pdf

2021

1 Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet pdf
2 Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling pdf

2020

1 LOCATION-RELATIVE ATTENTION MECHANISMS FOR ROBUST LONG-FORM SPEECH SYNTHESIS pdf
2 Attentron- Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding pdf
3 Peking Opera Synthesis via Duration Informed Attention Network pdf
4 Understanding Self-Attention of Self-Supervised Audio Transformers pdf

2019

1 Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments pdf
2 Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS pdf

2018

1 MONOTONIC CHUNKWISE ATTENTION pdf
2 FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS.pdf pdf

2017

1 Online and Linear-Time Attention by Enforcing Monotonic Alignments pdf
2 Attention Is All You Need pdf
Dual Learning

2022

1 pdf

2021

1 Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation pdf

2020

1 LRSpeech- Extremely Low-Resource Speech Synthesis and Recognition pdf
2 Almost Unsupervised Text to Speech and Automatic Speech Recognition pdf

2018

1 Machine Speech Chain with One-shot Speaker Adaptation pdf
2 Listening while Speaking- Speech Chain by Deep Learning pdf
EEG

2022

1 pdf

2021

1 On Interfacing the Brain with Quantum Computers: An Approach to Listen to the Logic of the Mind pdf

2020

1 Advancing Speech Synthesis using EEG pdf
2 Speech Synthesis using EEG pdf
3 Predicting Different Acoustic Features from EEG and towards direct synthesis of Audio Waveform from EEG pdf
S2S

2022

1 CVSS Corpus and Massively Multilingual Speech-to-Speech Translation pdf
2 Creating Speech-to-Speech Corpus from Dubbed Series pdf
3 Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation pdf
4 Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation pdf
5 Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation pdf
6 TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation pdf

2021

1 Assessing Evaluation Metrics for Speech-to-Speech Translation pdf
2 Direct simultaneous speech to speech translation pdf
3 Incremental Speech Synthesis For Speech-To-Speech Translation pdf
4 Textless Speech-to-Speech Translation on Real Data pdf
Other

2022

1 J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis pdf
2 KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics pdf
3 Residual-Guided Non-Intrusive Speech Quality Assessment pdf
4 Robotic Speech Synthesis: Perspectives on Interactions, Scenarios, and Ethics pdf
5 STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent pdf
6 The VoiceMOS Challenge 2022 pdf
7 Improving Self-Supervised Learning-based MOS Prediction Networks pdf
8 LibriS2S: A German-English Speech-to-Speech Translation Corpus pdf
9 Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch pdf
10 Fusion of Self-supervised Learned Models for MOS Prediction pdf
11 Karaoker: Alignment-free singing voice synthesis with speech training data pdf
12 Arabic Text-To-Speech (TTS) Data Preparation pdf
13 DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores pdf
14 SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis pdf
15 A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality pdf
16 UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022 pdf
17 MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment pdf
18 Into-TTS : Intonation Template based Prosody Control System pdf
19 Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts pdf
20 Macedonian Speech Synthesis for Assistive Technology Applications pdf
21 TuGeBiC: A Turkish German Bilingual Code-Switching Corpus pdf
22 Audio Similarity is Unreliable as a Proxy for Audio Quality pdf
23 Comparison of Speech Representations for the MOS Prediction System pdf
24 SAQAM: Spatial Audio Quality Assessment Metric pdf
25 Speech Quality Assessment through MOS using Non-Matching References pdf
26 The ZevoMOS entry to VoiceMOS Challenge 2022 pdf
27 Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities pdf
28 EEG2Mel: Reconstructing Sound from Brain Responses to Music pdf
29 BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus pdf
30 DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech pdf
31 Evaluating generative audio systems and their metrics pdf
32 Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks pdf
33 MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline pdf
34 ESPnet-ONNX: Bridging a Gap Between Research and Production pdf
35 Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset pdf

2021

1 MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network pdf
2 Hi-Fi Multi-Speaker English TTS Dataset pdf
3 ProsoBeast Prosody Annotation Tool pdf
4 KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset pdf
5 Deep Learning Based Assessment of Synthetic Speech Naturalness pdf
6 Deep Learning Based Assessment of Synthetic Speech Naturalness pdf
7 Speaker disentanglement in video-to-speech conversion pdf
8 Voice of Your Brain: Cognitive Representations of Imagined Speech,Overt Speech, and Speech Perception Based on EEG pdf
9 ADEPT: A Dataset for Evaluating Prosody Transfer pdf
10 EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model pdf
11 HUI-Audio-Corpus-German: A high quality TTS dataset pdf
12 Mixtures of Deep Neural Experts for Automated Speech Scoring pdf
13 RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis pdf
14 Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging pdf
15 Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging pdf
16 Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input pdf
17 An Objective Evaluation Framework for Pathological Speech Synthesis pdf
18 Digital Einstein Experience: Fast Text-to-Speech for Conversational AI pdf
19 Translatotron 2: Robust direct speech-to-speech translation pdf
20 Direct speech-to-speech translation with discrete units pdf
21 Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues pdf
22 RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform pdf
23 "Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World pdf
24 FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection pdf
25 AQP: An Open Modular Python Platform for Objective Speech and Audio Quality Metrics pdf
26 Generalization Ability of MOS Prediction Networks pdf
27 LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech pdf
28 Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence pdf
29 How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey pdf
30 Cross-lingual Low Resource Speaker Adaptation Using Phonological Features pdf
31 Visualising and Explaining Deep Learning Models for Speech Quality Prediction pdf

本站内容源自互联网,如有内容侵犯了你的权益,请联系删除相关内容,联系邮箱:yongqiangli@alumni.hust.edu.cn

Copyright © 2015-2035 li yongqiang All Rights Reserved