Home
Giới thiệu
Tài khoản
Đăng nhập
Quên mật khẩu
Đổi mật khẩu
Đăng ký tạo tài khoản
Liệt kê
Công trình khoa học
Bài báo trong nước
Bài báo quốc tế
Sách và giáo trình
Thống kê
Công trình khoa học
Bài báo khoa học
Sách và giáo trình
Giáo sư
Phó giáo sư
Tiến sĩ
Thạc sĩ
Lĩnh vực nghiên cứu
Tìm kiếm
Cá nhân
Nội dung
Góp ý
Hiệu chỉnh lý lịch
Thông tin chung
English
Đề tài NC khoa học
Bài báo, báo cáo khoa học
Hướng dẫn Sau đại học
Sách và giáo trình
Các học phần và môn giảng dạy
Giải thưởng khoa học, Phát minh, sáng chế
Khen thưởng
Thông tin khác
Tài liệu tham khảo
Hiệu chỉnh
Số người truy cập: 112,298,152
A Comparison of Algorithms used to measure the Similarity between two documents
Tác giả hoặc Nhóm tác giả:
Khuat Thanh Tung, Nguyen Duc Hung, Le Thi My Hanh
unfaithful spouse
developerstalk.com
i dreamed my husband cheated on me
abortion stories gone wrong
read
teenage abortion facts
walgreens pharmacy coupon
site
promo codes walgreens
Nơi đăng:
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET);
S
ố:
Volume 4 Issue 4;
Từ->đến trang
: 1117-1121;
Năm:
2015
Lĩnh vực:
Chưa xác định;
Loại:
Bài báo khoa học;
Thể loại:
Quốc tế
TÓM TẮT
Nowadays, measuring the similarity of documents plays an important role in text related researches and applications such as document clustering, plagiarism detection, information retrieval, machine translation and automatic essay scoring. Many researches have been proposed to solve this problem. They can be grouped into three main approaches: String-based, Corpus-based and Knowledge-based Similarities. In this paper, the similarity of two documents is gauged by using two string-based measures which are character-based and term-based algorithms. In character-based method, n-gram is utilized to find fingerprint for fingerprint and winnowing algorithms, then Dice coefficient is used to match two fingerprints found. In term-based measurement, cosine similarity algorithm is used. In this work, we would like to compare the effectiveness of algorithms used to measure the similarity between two documents. From the obtained results, we can find that the performance of fingerprint and winnowing is better than the cosine similarity. Moreover, the winnowing algorithm is more stable than others.
cvs weekly sale
cvs print
prescription savings cards
ABSTRACT
Nowadays, measuring the similarity of documents plays an important role in text related researches and applications such as document clustering, plagiarism detection, information retrieval, machine translation and automatic essay scoring. Many researches have been proposed to solve this problem. They can be grouped into three main approaches: String-based, Corpus-based and Knowledge-based Similarities. In this paper, the similarity of two documents is gauged by using two string-based measures which are character-based and term-based algorithms. In character-based method, n-gram is utilized to find fingerprint for fingerprint and winnowing algorithms, then Dice coefficient is used to match two fingerprints found. In term-based measurement, cosine similarity algorithm is used. In this work, we would like to compare the effectiveness of algorithms used to measure the similarity between two documents. From the obtained results, we can find that the performance of fingerprint and winnowing is better than the cosine similarity. Moreover, the winnowing algorithm is more stable than others.
© Đại học Đà Nẵng
Địa chỉ: 41 Lê Duẩn Thành phố Đà Nẵng
Điện thoại: (84) 0236 3822 041 ; Email: dhdn@ac.udn.vn