-
Text-Driven Foley Sound Generation With Latent Diffusion ModelYi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Peipei Wu, Mark D. Plumbley, Wenwu WangarXiv 2023. Paper  2023-06-172023-06-17
-
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision ModelsHao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian McAuleyarXiv 2023. Paper  2023-06-162023-06-16
-
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and VocodingChenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai YuarXiv 2023. Paper  2023-06-132023-06-13
-
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsYinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima MesgaraniarXiv 2023. Paper  2023-06-132023-06-13
-
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion BridgeWenhao Guan, Tao Li, Yishuang Li, Hukai Huang, Qingyang Hong, Lin LiInterspeech 2023. Paper  2023-06-072023-06-07
-
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive BiasZiyue Jiang, Yi Ren, Zhenhui Ye, Jinglin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao2023-06-062023-06-06
-
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio GenerationJiawei Huang, Yi Ren, Rongjie Huang, Dongchao Yang, Zhenhui Ye, Chen Zhang, Jinglin Liu, Xiang Yin, Zejun Ma, Zhou ZhaoarXiv 2023. Paper  2023-05-292023-05-29
-
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based ModelsMinki Kang, Wooseok Han, Sung Ju Hwang, Eunho YangarXiv 2023. Paper  2023-05-232023-05-23
-
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion TransformerHuadai Liu, Rongjie Huang, Xuan Lin, Wenqiang Xu, Maozong Zheng, Hong Chen, Jinzheng He, Zhou Zhao2023-05-222023-05-22
-
DiffAVA: Personalized Text-to-Audio Generation with Visual AlignmentShentong Mo, Jing Shi, Yapeng TianarXiv 2023. Paper  2023-05-222023-05-22
-
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-SpeechXin Jing, Yi Chang, Zijiang Yang, Jiangjian Xie, Andreas Triantafyllopoulos, Bjoern W. Schuller2023-05-222023-05-22
-
RMSSinger: Realistic-Music-Score based Singing Voice SynthesisJinzheng He, Jinglin Liu, Zhenhui Ye, Rongjie Huang, Chenye Cui, Huadai Liu, Zhou Zhao2023-05-182023-05-18
-
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency ModelZhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo2023-05-112023-05-11
-
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion ModelDeepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria2023-04-242023-04-24
-
DiffVoice: Text-to-Speech with Latent DiffusionZhijun Liu, Yiwei Guo, Kai YuICASSP 2023. Paper  2023-04-232023-04-23
-
An investigation into the adaptability of a diffusion-based TTS modelHaolin Chen, Philip N. GarnerarXiv 2023. Paper  2023-03-032023-03-03
-
Imaginary Voice: Face-styled Diffusion Model for Text-to-SpeechJiyoung Lee, Joon Son Chung, Soo-Whan ChungICASSP 2023. Paper  2023-02-272023-02-27
-
ERNIE-Music: Text-to-Waveform Music Generation with Diffusion ModelsPengfei Zhu, Chao Pang, Shuohuan Wang, Yekun Chai, Yu Sun, Hao Tian, Hua WuarXiv 2023. Paper  2023-02-092023-02-09
-
Noise2Music: Text-conditioned Music Generation with Diffusion ModelsQingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Wei Han2023-02-082023-02-08
-
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style PromptDongchao Yang, Songxiang Liu, Rongjie Huang, Guangzhi Lei, Chao Weng, Helen Meng, Dong Yu2023-01-312023-01-31
-
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion ModelsRongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao2023-01-302023-01-30
-
AudioLDM: Text-to-Audio Generation with Latent Diffusion ModelsHaohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley2023-01-292023-01-29
-
Moûsai: Text-to-Music Generation with Long-Context Latent DiffusionFlavio Schneider, Zhijing Jin, Bernhard Schölkopf2023-01-272023-01-27
-
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to SpeechZehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic2022-12-302022-12-30
-
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoderYusuke Yasuda, Tomoki TodaICASSP 2023. Paper  2022-12-162022-12-16
-
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label GuidanceYiwei Guo, Chenpeng Du, Xie Chen, Kai Yu2022-11-172022-11-17
-
Any-speaker Adaptive Text-To-Speech Synthesis with Diffusion ModelsMinki Kang, Dongchan Min, Sung Ju Hwang2022-11-172022-11-17
-
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTSDongchao Yang, Songxiang Liu, Jianwei Yu, Helin Wang, Chao Weng, Yuexian ZouICASSP 2023. Paper  2022-11-042022-11-04
-
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point IterationYuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani2022-10-032022-10-03
-
Diffsound: Discrete Diffusion Model for Text-to-sound GenerationDongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu2022-07-202022-07-20
-
Zero-Shot Voice Conditioning for Denoising Diffusion TTS ModelsAlon Levkovitch, Eliya Nachmani, Lior Wolf2022-06-052022-06-05
-
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed DataSungwon Kim, Heeseung Kim, Sungroh Yoon2022-05-302022-05-30
-
InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in TrainingZehua Chen, Xu Tan, Ke Wang, Shifeng Pan, Danilo Mandic, Lei He, Sheng ZhaoICASSP 2022. Paper  2022-02-082022-02-08
-
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANsSongxiang Liu, Dan Su, Dong Yu2022-01-282022-01-28
-
Guided-TTS:Text-to-Speech with Untranscribed SpeechHeeseung Kim, Sungwon Kim, Sungroh YoonICML 2021. Paper  2021-11-302021-11-30
-
EdiTTS: Score-based Editing for Controllable Text-to-SpeechJaesung Tae, Hyeongju Kim, Taesu Kim2021-10-062021-10-06
-
WaveGrad 2: Iterative Refinement for Text-to-Speech SynthesisNanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan2021-06-172021-06-17
-
Grad-TTS: A Diffusion Probabilistic Model for Text-to-SpeechVadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima Sadekova, Mikhail Kudinov2021-05-132021-05-13
-
DiffSinger: Singing Voice Synthesis via Shallow Diffusion MechanismJinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, Peng Liu, Zhou Zhao2021-05-062021-05-06
-
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech*Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung Jin Choi, Nam Soo KimInterspeech 2021. Paper  2021-04-032021-04-03
Counts - 40   Back to
top