Home
Giới thiệu
Tài khoản
Đăng nhập
Quên mật khẩu
Đổi mật khẩu
Đăng ký tạo tài khoản
Liệt kê
Công trình khoa học
Bài báo trong nước
Bài báo quốc tế
Sách và giáo trình
Thống kê
Công trình khoa học
Bài báo khoa học
Sách và giáo trình
Giáo sư
Phó giáo sư
Tiến sĩ
Thạc sĩ
Lĩnh vực nghiên cứu
Tìm kiếm
Cá nhân
Nội dung
Góp ý
Hiệu chỉnh lý lịch
Thông tin chung
English
Đề tài NC khoa học
Bài báo, báo cáo khoa học
Hướng dẫn Sau đại học
Sách và giáo trình
Các học phần và môn giảng dạy
Giải thưởng khoa học, Phát minh, sáng chế
Khen thưởng
Thông tin khác
Tài liệu tham khảo
Hiệu chỉnh
Số người truy cập: 106,718,693
A Speaker-Adaptive HMM-based Vietnamese Text-to-Speech System
Tác giả hoặc Nhóm tác giả:
Duy Khanh Ninh
Nơi đăng:
Proceedings of the 11th IEEE International Conference on Knowledge and Systems Engineering (KSE 2019);
S
ố:
2019;
Từ->đến trang
: 342-346;
Năm:
2019
Lĩnh vực:
Công nghệ thông tin;
Loại:
Báo cáo;
Thể loại:
Quốc tế
TÓM TẮT
This paper describes the first attempt in developing a Vietnamese HMM-based Text-to-Speech system using the speaker-adaptive approach. Although speaker-dependent systems have been built widely, no speaker-adaptive system has been developed for Vietnamese so far. We collected speech data from several Vietnamese native speakers and employed state-of-the-art speech analysis, model training and speaker adaptation techniques to develop the system. Besides, we performed perceptual experiments to compare the quality of speaker-adapted (SA) voices built on the average voice model and speaker-dependent (SD) voices built on SD models, and to confirm the effects of contextual features including word boundary (WB) and part-of-speech (POS) on the quality of synthetic speech. Evaluation results show that SA voices have significantly higher naturalness than SD voices when the same limited contextual feature set excluding WB and POS was used. In addition, SA voices trained with limited contextual features excluding WB and POS still have better quality than SD voices trained with full contextual features including WB and POS. These results show the robustness of the speaker-adaptive over the speaker-dependent approach for Vietnamese statistical parametric speech synthesis.
ABSTRACT
This paper describes the first attempt in developing a Vietnamese HMM-based Text-to-Speech system using the speaker-adaptive approach. Although speaker-dependent systems have been built widely, no speaker-adaptive system has been developed for Vietnamese so far. We collected speech data from several Vietnamese native speakers and employed state-of-the-art speech analysis, model training and speaker adaptation techniques to develop the system. Besides, we performed perceptual experiments to compare the quality of speaker-adapted (SA) voices built on the average voice model and speaker-dependent (SD) voices built on SD models, and to confirm the effects of contextual features including word boundary (WB) and part-of-speech (POS) on the quality of synthetic speech. Evaluation results show that SA voices have significantly higher naturalness than SD voices when the same limited contextual feature set excluding WB and POS was used. In addition, SA voices trained with limited contextual features excluding WB and POS still have better quality than SD voices trained with full contextual features including WB and POS. These results show the robustness of the speaker-adaptive over the speaker-dependent approach for Vietnamese statistical parametric speech synthesis.
© Đại học Đà Nẵng
Địa chỉ: 41 Lê Duẩn Thành phố Đà Nẵng
Điện thoại: (84) 0236 3822 041 ; Email: dhdn@ac.udn.vn