1. UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
    Heeseung Kim, Sungwon Kim, Jiheum Yeom, Sungroh Yoon
    arXiv 2023. Paper  
    2023-06-28
    2023-06-28
  2. Diffusion Posterior Sampling for Informed Single-Channel Dereverberation
    Jean-Marie Lemercier, Simon Welker, Timo Gerkmann
    arXiv 2023. Paper  
    2023-06-21
    2023-06-21
  3. Text-Driven Foley Sound Generation With Latent Diffusion Model
    Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Peipei Wu, Mark D. Plumbley, Wenwu Wang
    arXiv 2023. Paper  
    2023-06-17
    2023-06-17
  4. CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
    Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian McAuley
    arXiv 2023. Paper  
    2023-06-16
    2023-06-16
  5. Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
    Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter
    arXiv 2023. Paper  
    2023-06-15
    2023-06-15
  6. Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement
    Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang
    arXiv 2023. Paper  
    2023-06-14
    2023-06-14
  7. StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
    Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani
    arXiv 2023. Paper  
    2023-06-13
    2023-06-13
  8. UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
    Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu
    arXiv 2023. Paper  
    2023-06-13
    2023-06-13
  9. HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
    Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee
    arXiv 2023. Paper  
    2023-06-12
    2023-06-12
  10. Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion
    Haogeng Liu, Tao Wang, Jie Cao, Ran He, Jianhua Tao
    arXiv 2023. Paper  
    2023-06-09
    2023-06-09
  11. Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
    Wenhao Guan, Tao Li, Yishuang Li, Hukai Huang, Qingyang Hong, Lin Li
    Interspeech 2023. Paper  
    2023-06-07
    2023-06-07
  12. Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
    Ziyue Jiang, Yi Ren, Zhenhui Ye, Jinglin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao
    arXiv 2023. Paper   Github  
    2023-06-06
    2023-06-06
  13. Zero-Shot Blind Audio Bandwidth Extension
    Eloi Moliner, Filip Elvander, Vesa Välimäki
    arXiv 2023. Paper  
    2023-06-02
    2023-06-02
  14. UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model
    Anastasiia Iashchenko, Pavel Andreev, Ivan Shchekotov, Nicholas Babaev, Dmitry Vetrov
    Interspeech 2023. Paper  
    2023-06-01
    2023-06-01
  15. EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
    Haobin Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
    InterSpeech 2023. Paper  
    2023-06-01
    2023-06-01
  16. Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
    Jiawei Huang, Yi Ren, Rongjie Huang, Dongchao Yang, Zhenhui Ye, Chen Zhang, Jinglin Liu, Xiang Yin, Zejun Ma, Zhou Zhao
    arXiv 2023. Paper  
    2023-05-29
    2023-05-29
  17. Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
    Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng
    Interspeech 2023. Paper  
    2023-05-26
    2023-05-26
  18. DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion
    Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee
    arXiv 2023. Paper   Project  
    2023-05-25
    2023-05-25
  19. Efficient Neural Music Generation
    Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang
    arXiv 2023. Paper   Github  
    2023-05-25
    2023-05-25
  20. Diffusion-Based Audio Inpainting
    Eloi Moliner, Vesa Välimäki
    arXiv 2023. Paper  
    2023-05-24
    2023-05-24
  21. FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
    Ziyue Jiang, Qian Yang, Jialong Zuo, Zhenhui Ye, Rongjie Huang, Yi Ren, Zhou Zhao
    arXiv 2023. Paper   Github  
    2023-05-23
    2023-05-23
  22. ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
    Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang
    arXiv 2023. Paper  
    2023-05-23
    2023-05-23
  23. SE-Bridge: Speech Enhancement with Consistent Brownian Bridge
    Zhibin Qiu, Mengfan Fu, Fuchun Sun, Gulila Altenbek, Hao Huang
    arXiv 2023. Paper  
    2023-05-23
    2023-05-23
  24. U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
    Xin Jing, Yi Chang, Zijiang Yang, Jiangjian Xie, Andreas Triantafyllopoulos, Bjoern W. Schuller
    arXiv 2023. Paper   Project  
    2023-05-22
    2023-05-22
  25. DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
    Shentong Mo, Jing Shi, Yapeng Tian
    arXiv 2023. Paper  
    2023-05-22
    2023-05-22
  26. ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
    Huadai Liu, Rongjie Huang, Xuan Lin, Wenqiang Xu, Maozong Zheng, Hong Chen, Jinzheng He, Zhou Zhao
    arXiv 2023. Paper   Project  
    2023-05-22
    2023-05-22
  27. Duplex Diffusion Models Improve Speech-to-Speech Translation
    Xianchao Wu
    ACL 2023. Paper  
    2023-05-22
    2023-05-22
  28. A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model
    Ibrahim Malik, Siddique Latif, Raja Jurdak, Björn Schuller
    arXiv 2023. Paper  
    2023-05-19
    2023-05-19
  29. RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
    Jinzheng He, Jinglin Liu, Zhenhui Ye, Rongjie Huang, Chenye Cui, Huadai Liu, Zhou Zhao
    ACL 2023. Paper   Project  
    2023-05-18
    2023-05-18
  30. Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
    Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji
    arXiv 2023. Paper  
    2023-05-18
    2023-05-18
  31. CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
    Zhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo
    arXiv 2023. Paper   Github  
    2023-05-11
    2023-05-11
  32. Diffusion-based Signal Refiner for Speech Separation
    Masato Hirano, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji
    arXiv 2023. Paper  
    2023-05-10
    2023-05-10
  33. Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
    Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria
    arXiv 2023. Paper   Project   Github  
    2023-04-24
    2023-04-24
  34. DiffVoice: Text-to-Speech with Latent Diffusion
    Zhijun Liu, Yiwei Guo, Kai Yu
    ICASSP 2023. Paper  
    2023-04-23
    2023-04-23
  35. AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
    Yuancheng Wang, Zeqian Ju, Xu Tan, Lei He, Zhizheng Wu, Jiang Bian, Sheng Zhao
    arXiv 2023. Paper   Project  
    2023-04-03
    2023-04-03
  36. Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator
    Yunhao Chen, Yunjie Zhu, Zihui Yan, Jianlu Shen, Zhen Ren, Yifan Huang
    arXiv 2023. Paper   Github  
    2023-03-27
    2023-03-27
  37. Enhancing Unsupervised Speech Recognition with Diffusion GANs
    Xianchao Wu
    ICASSP 2023. Paper  
    2023-03-23
    2023-03-23
  38. Speech Signal Improvement Using Causal Generative Diffusion Models
    Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Tal Peer, Timo Gerkmann
    ICASSP 2023. Paper  
    2023-03-15
    2023-03-15
  39. Generating symbolic music using diffusion models
    Lilac Atassi
    arXiv 2023. Paper  
    2023-03-15
    2023-03-15
  40. DiffuseRoll: Multi-track multi-category music generation based on diffusion model
    Hongfei Wang
    arXiv 2023. Paper  
    2023-03-14
    2023-03-14
  41. An investigation into the adaptability of a diffusion-based TTS model
    Haolin Chen, Philip N. Garner
    arXiv 2023. Paper  
    2023-03-03
    2023-03-03
  42. Defending against Adversarial Audio via Diffusion Model
    Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, Chaowei Xiao
    ICLR 2023. Paper   Github  
    2023-03-02
    2023-03-02
  43. Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement
    Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann
    arXiv 2023. Paper  
    2023-02-28
    2023-02-28
  44. Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
    Jiyoung Lee, Joon Son Chung, Soo-Whan Chung
    ICASSP 2023. Paper  
    2023-02-27
    2023-02-27
  45. Metric-oriented Speech Enhancement using Diffusion Probabilistic Model
    Chen Chen, Yuchen Hu, Weiwei Weng, Eng Siong Chng
    arXiv 2023. Paper  
    2023-02-23
    2023-02-23
  46. ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models
    Pengfei Zhu, Chao Pang, Shuohuan Wang, Yekun Chai, Yu Sun, Hao Tian, Hua Wu
    arXiv 2023. Paper  
    2023-02-09
    2023-02-09
  47. Noise2Music: Text-conditioned Music Generation with Diffusion Models
    Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Wei Han
    arXiv 2023. Paper   Project  
    2023-02-08
    2023-02-08
  48. Multi-Source Diffusion Models for Simultaneous Music Generation and Separation
    Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, Emanuele Rodolà
    arXiv 2023. Paper   Project  
    2023-02-04
    2023-02-04
  49. Multi-Source Diffusion Models for Simultaneous Music Generation and Separation
    Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, Emanuele Rodolà
    arXiv 2023. Paper   Project  
    2023-02-04
    2023-02-04
  50. InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
    Dongchao Yang, Songxiang Liu, Rongjie Huang, Guangzhi Lei, Chao Weng, Helen Meng, Dong Yu
    arXiv 2023. Paper   Project  
    2023-01-31
    2023-01-31
  51. Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
    Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao
    arXiv 2023. Paper   Project  
    2023-01-30
    2023-01-30
  52. AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
    Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley
    arXiv 2023. Paper   Project   Github  
    2023-01-29
    2023-01-29
  53. Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
    Flavio Schneider, Zhijing Jin, Bernhard Schölkopf
    arXiv 2023. Paper   Project   Github  
    2023-01-27
    2023-01-27
  54. Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation
    Shahar Lutati, Eliya Nachmani, Lior Wolf
    arXiv 2023. Paper  
    2023-01-25
    2023-01-25
  55. ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
    Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic
    arXiv 2022. Paper   Project  
    2022-12-30
    2022-12-30
  56. StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
    Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann
    ICASSP 2023. Paper  
    2022-12-22
    2022-12-22
  57. MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
    Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo
    CVPR 2023. Paper   Github  
    2022-12-19
    2022-12-19
  58. Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
    Yusuke Yasuda, Tomoki Toda
    ICASSP 2023. Paper  
    2022-12-16
    2022-12-16
  59. Any-speaker Adaptive Text-To-Speech Synthesis with Diffusion Models
    Minki Kang, Dongchan Min, Sung Ju Hwang
    ICASSP 2023. Paper   Project  
    2022-11-17
    2022-11-17
  60. EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance
    Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu
    ICASSP 2023. Paper   Project  
    2022-11-17
    2022-11-17
  61. Unsupervised vocal dereverberation with diffusion-based generative models
    Koichi Saito, Naoki Murata, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuhta Takida, Takao Fukui, Yuki Mitsufuji
    ICASSP 2023. Paper  
    2022-11-08
    2022-11-08
  62. DiffPhase: Generative Diffusion-based STFT Phase Retrieval
    Tal Peer, Simon Welker, Timo Gerkmann
    ICASSP 2023. Paper  
    2022-11-08
    2022-11-08
  63. NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
    Dongchao Yang, Songxiang Liu, Jianwei Yu, Helin Wang, Chao Weng, Yuexian Zou
    ICASSP 2023. Paper  
    2022-11-04
    2022-11-04
  64. Cold Diffusion for Speech Enhancement
    Hao Yen, François G. Germain, Gordon Wichern, Jonathan Le Roux
    ICASSP 2023. Paper  
    2022-11-04
    2022-11-04
  65. Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration
    Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann
    Interspeech 2022. Paper   Project   Github  
    2022-11-04
    2022-11-04
  66. SDMuse: Stochastic Differential Music Editing and Generation via Hybrid Representation
    Chen Zhang, Yi Ren, Kejun Zhang, Shuicheng Yan
    arXiv 2022. Paper   Project  
    2022-11-01
    2022-11-01
  67. Diffusion-based Generative Speech Source Separation
    Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe, Min-Seok Choi
    ICASSP 2023. Paper  
    2022-10-31
    2022-10-31
  68. SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement
    Zhibin Qiu, Mengfan Fu, Yinfeng Yu, LiLi Yin, Fuchun Sun, Hao Huang
    ICASSP 2022. Paper   Github  
    2022-10-30
    2022-10-30
  69. A Versatile Diffusion-based Generative Refiner for Speech Enhancement
    Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji
    ICASSP 2023. Paper  
    2022-10-27
    2022-10-27
  70. Conditioning and Sampling in Variational Diffusion Models for Speech Super-resolution
    Chin-Yun Yu, Sung-Lin Yeh, György Fazekas, Hao Tang
    ICASSP 2023. Paper   Project   Github  
    2022-10-27
    2022-10-27
  71. Solving Audio Inverse Problems with a Diffusion Model
    Eloi Moliner, Jaakko Lehtinen, Vesa Välimäki
    ICASSP 2023. Paper  
    2022-10-27
    2022-10-27
  72. Full-band General Audio Synthesis with Score-based Diffusion
    Santiago Pascual, Gautam Bhattacharya, Chunghsin Yeh, Jordi Pons, Joan Serrà
    arXiv 2022. Paper  
    2022-10-26
    2022-10-26
  73. TransFusion: Transcribing Speech with Multinomial Diffusion
    Matthew Baas, Kevin Eloff, Herman Kamper
    SACAIR 2022. Paper   Github  
    2022-10-14
    2022-10-14
  74. Hierarchical Diffusion Models for Singing Voice Neural Vocoder
    Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji
    arXiv 2022. Paper  
    2022-10-14
    2022-10-14
  75. WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
    Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani
    IEEE SLT 2023. Paper   Project  
    2022-10-03
    2022-10-03
  76. Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN
    Yin-Ping Cho, Yu Tsao, Hsin-Min Wang, Yi-Wen Liu
    arXiv 2022. Paper   Project  
    2022-09-21
    2022-09-21
  77. Instrument Separation of Symbolic Music by Explicitly Guided Diffusion Model
    Sangjun Han, Hyeongrae Ihm, DaeHan Ahn, Woohyung Lim
    NeurIPS Workshop 2022. Paper  
    2022-09-05
    2022-09-05
  78. Speech Enhancement and Dereverberation with Diffusion-based Generative Models
    Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann
    arXiv 2022. Paper   Project   Github  
    2022-08-11
    2022-08-11
  79. DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation
    Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang
    ISMIR 2022. Paper   Github  
    2022-08-09
    2022-08-09
  80. Diffsound: Discrete Diffusion Model for Text-to-sound Generation
    Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu
    TASLP 2022. Paper   Project  
    2022-07-20
    2022-07-20
  81. ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
    Rongjie Huang, Zhou Zhao, Huadai Liu, Jinglin Liu, Chenye Cui, Yi Ren
    ACM Multimedia 2022. Paper   Project  
    2022-07-13
    2022-07-13
  82. NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates
    Seungu Han, Junhyeok Lee
    Interspeech 2022. Paper   Project  
    2022-06-17
    2022-06-17
  83. CARD: Classification and Regression Diffusion Models
    Xizewen Han, Huangjie Zheng, Mingyuan Zhou
    NeurIPS 2022. Paper  
    2022-06-15
    2022-06-15
  84. Adversarial Audio Synthesis with Complex-valued Polynomial Networks
    Yongtao Wu, Grigorios G Chrysos, Volkan Cevher
    ICML workshop 2022. Paper  
    2022-06-14
    2022-06-14
  85. Multi-instrument Music Synthesis with Spectrogram Diffusion
    Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel
    ISMIR 2022. Paper  
    2022-06-11
    2022-06-11
  86. Universal Speech Enhancement with Score-based Diffusion
    Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini
    arXiv 2022. Paper  
    2022-06-07
    2022-06-07
  87. Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models
    Alon Levkovitch, Eliya Nachmani, Lior Wolf
    Interspeech 2022. Paper   Project  
    2022-06-05
    2022-06-05
  88. Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data
    Sungwon Kim, Heeseung Kim, Sungroh Yoon
    arXiv 2022. Paper   Project  
    2022-05-30
    2022-05-30
  89. BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
    Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu
    NeurIPS 2022. Paper   Github  
    2022-05-30
    2022-05-30
  90. FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
    Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao
    IJCAI 2022. Paper   Project   Github  
    2022-04-21
    2022-04-21
  91. Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain
    Simon Welker, Julius Richter, Timo Gerkmann
    InterSpeech 2022. Paper   Github  
    2022-03-31
    2022-03-31
  92. SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping
    Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani
    Interspeech 2022. Paper  
    2022-03-31
    2022-03-31
  93. BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis
    Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
    ICLR 2022. Paper   Github  
    2022-03-25
    2022-03-25
  94. Conditional Diffusion Probabilistic Model for Speech Enhancement
    Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao
    IEEE 2022. Paper   Github  
    2022-02-10
    2022-02-10
  95. InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training
    Zehua Chen, Xu Tan, Ke Wang, Shifeng Pan, Danilo Mandic, Lei He, Sheng Zhao
    ICASSP 2022. Paper  
    2022-02-08
    2022-02-08
  96. ItôWave: Itô Stochastic Differential Equation Is All You Need For Wave Generation
    Shoule Wu, Ziqiang Shi
    CoRR 2022. Paper   Project  
    2022-01-29
    2022-01-29
  97. DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
    Songxiang Liu, Dan Su, Dong Yu
    arXiv 2022. Paper   Github  
    2022-01-28
    2022-01-28
  98. Itô-Taylor Sampling Scheme for Denoising Diffusion Probabilistic Models using Ideal Derivatives
    Hideyuki Tachibana, Mocho Go, Muneyoshi Inahara, Yotaro Katayama, Yotaro Watanabe
    arXiv 2021. Paper  
    2021-12-26
    2021-12-26
  99. Guided-TTS:Text-to-Speech with Untranscribed Speech
    Heeseung Kim, Sungwon Kim, Sungroh Yoon
    ICML 2021. Paper  
    2021-11-30
    2021-11-30
  100. Denoising Diffusion Gamma Models
    Eliya Nachmani, Robin San Roman, Lior Wolf
    arXiv 2021. Paper  
    2021-10-10
    2021-10-10
  101. EdiTTS: Score-based Editing for Controllable Text-to-Speech
    Jaesung Tae, Hyeongju Kim, Taesu Kim
    Interspeech 2022. Paper   Project   Github  
    2021-10-06
    2021-10-06
  102. Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme
    Vadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima Sadekova, Mikhail Kudinov, Jiansheng Wei
    ICLR 2022. Paper   Project  
    2021-09-28
    2021-09-28
  103. A Study on Speech Enhancement Based on Diffusion Probabilistic Model
    Yen-Ju Lu, Yu Tsao, Shinji Watanabe
    APSIPA 2021. Paper  
    2021-07-25
    2021-07-25
  104. Variational Diffusion Models
    Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho
    NeurIPS 2021. Paper   Github  
    2021-07-01
    2021-07-01
  105. WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
    Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan
    Interspeech 2021. Paper   Project   Github   Github2  
    2021-06-17
    2021-06-17
  106. CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis
    Simon Rouard, Gaëtan Hadjeres
    ISMIR 2021. Paper   Project  
    2021-06-14
    2021-06-14
  107. PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Driven Adaptive Prior
    Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu
    ICLR 2022. Paper   Project  
    2021-06-11
    2021-06-11
  108. DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion*
    Songxiang Liu, Yuewen Cao, Dan Su, Helen Meng
    IEEE 2021. Paper   Github  
    2021-05-28
    2021-05-28
  109. ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation
    Shoule Wu, Ziqiang Shi
    arXiv 2022. Paper   Project  
    2021-05-17
    2021-05-17
  110. Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
    Vadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima Sadekova, Mikhail Kudinov
    ICML 2021. Paper   Project   Github  
    2021-05-13
    2021-05-13
  111. DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
    Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, Peng Liu, Zhou Zhao
    AAAI 2022. Paper   Project   Github  
    2021-05-06
    2021-05-06
  112. DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
    Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, Peng Liu, Zhou Zhao
    AAAI 2022. Paper   Project   Github  
    2021-05-06
    2021-05-06
  113. Restoring degraded speech via a modified diffusion model
    Jianwei Zhang, Suren Jayasuriya, Visar Berisha
    Interspeech 2021. Paper  
    2021-04-22
    2021-04-22
  114. NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling*
    Junhyeok Lee, Seungu Han
    Interspeech 2021. Paper   Project   Github  
    2021-04-06
    2021-04-06
  115. Diff-TTS: A Denoising Diffusion Model for Text-to-Speech*
    Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim
    Interspeech 2021. Paper  
    2021-04-03
    2021-04-03
  116. Symbolic Music Generation with Diffusion Models
    Gautam Mittal, Jesse Engel, Curtis Hawthorne, Ian Simon
    ISMIR 2021. Paper   Github  
    2021-03-30
    2021-03-30
  117. DiffWave: A Versatile Diffusion Model for Audio Synthesis
    Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro
    ICLR 2021. Paper   Github  
    2020-09-21
    2020-09-21
  118. WaveGrad: Estimating Gradients for Waveform Generation
    Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Cha
    ICLR 2021. Paper   Project   Github  
    2020-09-02
    2020-09-02
Counts - 118   Back to top