 Similarity Detection for Higher-Order Structure of DNA Sequences
Tác giả hoặc Nhóm tác giả: Nguyen Thi Ngoc Anh, Ho Phan Hieu, Tran Anh Kiet, and Vo Trung Hung
Nơi đăng: Journal of Science and Technology: Issue on Information and Communications Technology; Số: Vol. 3, No.2;Từ->đến trang: 28-34;Năm: 2017
Lĩnh vực: Công nghệ thông tin; Loại: Bài báo khoa học; Thể loại: Trong nước
With the advances in data collection and storage capabilities, large amount of multidimensional dataset, known as higher-order data representation, has been generated on bioinformatics applications recently, especially in DNA sequences recognition. This paper thus proposes a mathematical model that could be capable of the multidimensional problem of DNA similarity detection with high accuracy and reliability. To this end, the paper covers the central issues of multidimensional DNA gene expression data, including: (1) formulating multidimensional DNA data into higher-order representation; (2) recovering missing values; (3) decomposing high-order DNA data directly from their tensorial representation to extracted useful information for classification. Consequently, an exploring a novel type of third-order microarray expression, termed as gene - sample - time (GST), is presented for biological sample classification. The contributions will be distributed along two main thrusts of effectiveness; including latent modeling setting for imputing missing values based on the High-Order Kalman Filter and feature extraction based on Tensor Discriminative Feature Extraction. The experimental performance on real dataset of DNA sequences corroborates the advantages of the proposed approaches upon those of the matrix-based algorithms and recent tensor-based, discriminant-decomposition, in terms of missing values completion, classication accuracy and computation time.
