1. Text-Driven Foley Sound Generation With Latent Diffusion Model
    Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Peipei Wu, Mark D. Plumbley, Wenwu Wang
    arXiv 2023. Paper  
    2023-06-17
    2023-06-17
  2. CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
    Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian McAuley
    arXiv 2023. Paper  
    2023-06-16
    2023-06-16
  3. UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
    Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu
    arXiv 2023. Paper  
    2023-06-13
    2023-06-13
  4. StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
    Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani
    arXiv 2023. Paper  
    2023-06-13
    2023-06-13
  5. Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
    Wenhao Guan, Tao Li, Yishuang Li, Hukai Huang, Qingyang Hong, Lin Li
    Interspeech 2023. Paper  
    2023-06-07
    2023-06-07
  6. Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
    Ziyue Jiang, Yi Ren, Zhenhui Ye, Jinglin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao
    arXiv 2023. Paper   Github  
    2023-06-06
    2023-06-06
  7. Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
    Jiawei Huang, Yi Ren, Rongjie Huang, Dongchao Yang, Zhenhui Ye, Chen Zhang, Jinglin Liu, Xiang Yin, Zejun Ma, Zhou Zhao
    arXiv 2023. Paper  
    2023-05-29
    2023-05-29
  8. ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
    Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang
    arXiv 2023. Paper  
    2023-05-23
    2023-05-23
  9. ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
    Huadai Liu, Rongjie Huang, Xuan Lin, Wenqiang Xu, Maozong Zheng, Hong Chen, Jinzheng He, Zhou Zhao
    arXiv 2023. Paper   Project  
    2023-05-22
    2023-05-22
  10. DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
    Shentong Mo, Jing Shi, Yapeng Tian
    arXiv 2023. Paper  
    2023-05-22
    2023-05-22
  11. U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
    Xin Jing, Yi Chang, Zijiang Yang, Jiangjian Xie, Andreas Triantafyllopoulos, Bjoern W. Schuller
    arXiv 2023. Paper   Project  
    2023-05-22
    2023-05-22
  12. RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
    Jinzheng He, Jinglin Liu, Zhenhui Ye, Rongjie Huang, Chenye Cui, Huadai Liu, Zhou Zhao
    ACL 2023. Paper   Project  
    2023-05-18
    2023-05-18
  13. CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
    Zhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo
    arXiv 2023. Paper   Github  
    2023-05-11
    2023-05-11
  14. Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
    Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria
    arXiv 2023. Paper   Project   Github  
    2023-04-24
    2023-04-24
  15. DiffVoice: Text-to-Speech with Latent Diffusion
    Zhijun Liu, Yiwei Guo, Kai Yu
    ICASSP 2023. Paper  
    2023-04-23
    2023-04-23
  16. An investigation into the adaptability of a diffusion-based TTS model
    Haolin Chen, Philip N. Garner
    arXiv 2023. Paper  
    2023-03-03
    2023-03-03
  17. Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
    Jiyoung Lee, Joon Son Chung, Soo-Whan Chung
    ICASSP 2023. Paper  
    2023-02-27
    2023-02-27
  18. ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models
    Pengfei Zhu, Chao Pang, Shuohuan Wang, Yekun Chai, Yu Sun, Hao Tian, Hua Wu
    arXiv 2023. Paper  
    2023-02-09
    2023-02-09
  19. Noise2Music: Text-conditioned Music Generation with Diffusion Models
    Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Wei Han
    arXiv 2023. Paper   Project  
    2023-02-08
    2023-02-08
  20. InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
    Dongchao Yang, Songxiang Liu, Rongjie Huang, Guangzhi Lei, Chao Weng, Helen Meng, Dong Yu
    arXiv 2023. Paper   Project  
    2023-01-31
    2023-01-31
  21. Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
    Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao
    arXiv 2023. Paper   Project  
    2023-01-30
    2023-01-30
  22. AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
    Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley
    arXiv 2023. Paper   Project   Github  
    2023-01-29
    2023-01-29
  23. Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
    Flavio Schneider, Zhijing Jin, Bernhard Schölkopf
    arXiv 2023. Paper   Project   Github  
    2023-01-27
    2023-01-27
  24. ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
    Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic
    arXiv 2022. Paper   Project  
    2022-12-30
    2022-12-30
  25. Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
    Yusuke Yasuda, Tomoki Toda
    ICASSP 2023. Paper  
    2022-12-16
    2022-12-16
  26. EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance
    Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu
    ICASSP 2023. Paper   Project  
    2022-11-17
    2022-11-17
  27. Any-speaker Adaptive Text-To-Speech Synthesis with Diffusion Models
    Minki Kang, Dongchan Min, Sung Ju Hwang
    ICASSP 2023. Paper   Project  
    2022-11-17
    2022-11-17
  28. NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
    Dongchao Yang, Songxiang Liu, Jianwei Yu, Helin Wang, Chao Weng, Yuexian Zou
    ICASSP 2023. Paper  
    2022-11-04
    2022-11-04
  29. WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
    Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani
    IEEE SLT 2023. Paper   Project  
    2022-10-03
    2022-10-03
  30. Diffsound: Discrete Diffusion Model for Text-to-sound Generation
    Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu
    TASLP 2022. Paper   Project  
    2022-07-20
    2022-07-20
  31. Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models
    Alon Levkovitch, Eliya Nachmani, Lior Wolf
    Interspeech 2022. Paper   Project  
    2022-06-05
    2022-06-05
  32. Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data
    Sungwon Kim, Heeseung Kim, Sungroh Yoon
    arXiv 2022. Paper   Project  
    2022-05-30
    2022-05-30
  33. InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training
    Zehua Chen, Xu Tan, Ke Wang, Shifeng Pan, Danilo Mandic, Lei He, Sheng Zhao
    ICASSP 2022. Paper  
    2022-02-08
    2022-02-08
  34. DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
    Songxiang Liu, Dan Su, Dong Yu
    arXiv 2022. Paper   Github  
    2022-01-28
    2022-01-28
  35. Guided-TTS:Text-to-Speech with Untranscribed Speech
    Heeseung Kim, Sungwon Kim, Sungroh Yoon
    ICML 2021. Paper  
    2021-11-30
    2021-11-30
  36. EdiTTS: Score-based Editing for Controllable Text-to-Speech
    Jaesung Tae, Hyeongju Kim, Taesu Kim
    Interspeech 2022. Paper   Project   Github  
    2021-10-06
    2021-10-06
  37. WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
    Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan
    Interspeech 2021. Paper   Project   Github   Github2  
    2021-06-17
    2021-06-17
  38. Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
    Vadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima Sadekova, Mikhail Kudinov
    ICML 2021. Paper   Project   Github  
    2021-05-13
    2021-05-13
  39. DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
    Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, Peng Liu, Zhou Zhao
    AAAI 2022. Paper   Project   Github  
    2021-05-06
    2021-05-06
  40. Diff-TTS: A Denoising Diffusion Model for Text-to-Speech*
    Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim
    Interspeech 2021. Paper  
    2021-04-03
    2021-04-03
Counts - 40   Back to top