Thông tin chung

  English

  Đề tài NC khoa học
  Bài báo, báo cáo khoa học
  Hướng dẫn Sau đại học
  Sách và giáo trình
  Các học phần và môn giảng dạy
  Giải thưởng khoa học, Phát minh, sáng chế
  Khen thưởng
  Thông tin khác

  Tài liệu tham khảo

  Hiệu chỉnh

 
Số người truy cập: 106,051,551

 Evaluation of speaker-dependent and average-voice Vietnamese statistical speech synthesis systems
Tác giả hoặc Nhóm tác giả: Duy Khanh Ninh
Nơi đăng: Tạp chí khoa học và công nghệ Đại học Đà Nẵng; Số: 17(12.1);Từ->đến trang: 11-16;Năm: 2019
Lĩnh vực: Công nghệ thông tin; Loại: Bài báo khoa học; Thể loại: Trong nước
TÓM TẮT
This paper describes the development and evaluation of a Vietnamese statistical speech synthesis system using the average voice approach. Although speaker-dependent systems have been applied extensively, no average voice based system has been developed for Vietnamese so far. We collected speech data from several Vietnamese native speakers and employed state-of-the-art speech analysis, model training and speaker adaptation techniques to develop the system. Besides, we performed perceptual experiments to compare the quality of speaker-adapted (SA) voices built on the average voice model and speaker-dependent (SD) voices built on SD models, and to confirm the effects of contextual features including word boundary (WB) and part-of-speech (POS) on the quality of synthetic speech. Evaluation results show that SA voices have significantly higher naturalness than SD voices when the same limited contextual feature set excluding WB and POS was used. In addition, SA voices trained with limited contextual features excluding WB and POS still have better quality than SD voices trained with full contextual features including WB and POS. These results show the robustness of the average voice method over the speaker-dependent approach for Vietnamese statistical speech synthesis.
ABSTRACT
This paper describes the development and evaluation of a Vietnamese statistical speech synthesis system using the average voice approach. Although speaker-dependent systems have been applied extensively, no average voice based system has been developed for Vietnamese so far. We collected speech data from several Vietnamese native speakers and employed state-of-the-art speech analysis, model training and speaker adaptation techniques to develop the system. Besides, we performed perceptual experiments to compare the quality of speaker-adapted (SA) voices built on the average voice model and speaker-dependent (SD) voices built on SD models, and to confirm the effects of contextual features including word boundary (WB) and part-of-speech (POS) on the quality of synthetic speech. Evaluation results show that SA voices have significantly higher naturalness than SD voices when the same limited contextual feature set excluding WB and POS was used. In addition, SA voices trained with limited contextual features excluding WB and POS still have better quality than SD voices trained with full contextual features including WB and POS. These results show the robustness of the average voice method over the speaker-dependent approach for Vietnamese statistical speech synthesis.
© Đại học Đà Nẵng
 
 
Địa chỉ: 41 Lê Duẩn Thành phố Đà Nẵng
Điện thoại: (84) 0236 3822 041 ; Email: dhdn@ac.udn.vn