Science curiculum vitae personally - University of Da Nang




	Thông tin chung

	English

	Đề tài NC khoa học
	Bài báo, báo cáo khoa học
	Hướng dẫn Sau đại học
	Sách và giáo trình
	Các học phần và môn giảng dạy
	Giải thưởng khoa học, Phát minh, sáng chế
	Khen thưởng
	Thông tin khác

	Tài liệu tham khảo

	Hiệu chỉnh


Số người truy cập: 109,877,773

Representing context in abbreviation expansion using machine learning approach

Tác giả hoặc Nhóm tác giả: Trieu Thi Ly Ly, Nguyen Van Quy, Ninh Khanh Duy, Huynh Huu Hung, Dang Duy Thang

Nơi đăng: Kỷ yếu Hội nghị Quốc gia lần thứ X về Nghiên cứu cơ bản và ứng dụng Công nghệ thông tin (FAIR); Số: 2017;Từ->đến trang: 816-822;Năm: 2017

Lĩnh vực: Công nghệ thông tin; Loại: Báo cáo; Thể loại: Trong nước

TÓM TẮT

Text normalization is an essential problem in applications involving natural language processing since the input text often contains non-standard words such as abbreviations, numbers, and foreign words. This paper deals with the problem of normalizing abbreviations in Vietnamese text when there are several possible expansions for an abbreviation. To disambiguate the expansions for an abbreviation, a machine learning approach is proposed in which contextual information of the abbreviation is represented by either of the two models: Bag-of-words or Doc2vec. Experiments with Naïve Bayes classifier on a dataset of abbreviations collected by us shows that the average ratios of expanding correctly for Bag-of-words and Doc2vec are 86.0% and 79.7 %, respectively. Experimental results also show that information on the context plays an important role in the correct expansion of an abbreviation.

ABSTRACT

Địa chỉ: 41 Lê Duẩn Thành phố Đà Nẵng

Điện thoại: (84) 0236 3822 041 ; Email: dhdn@ac.udn.vn