Thông tin chung

  English

  Đề tài NC khoa học
  Bài báo, báo cáo khoa học
  Hướng dẫn Sau đại học
  Sách và giáo trình
  Các học phần và môn giảng dạy
  Giải thưởng khoa học, Phát minh, sáng chế
  Khen thưởng
  Thông tin khác

  Tài liệu tham khảo

  Hiệu chỉnh

 
Số người truy cập: 107,374,607

 An Investigation on Vietnamese Credit Scoring based on Big Data Platform and Ensemble Learning
Tác giả hoặc Nhóm tác giả: Quang-Linh Tran1; Binh Van Duong, Gia-Huy Lam, Dat Vuong,
and Trong-Hop Do;
Nơi đăng: The First International Conference on Intelligence of Things; Số: ISBN 978-3-031-15062-3;Từ->đến trang: 289–298;Năm: 2022
Lĩnh vực: Công nghệ thông tin; Loại: Bài báo khoa học; Thể loại: Quốc tế
TÓM TẮT
The credit score is a vital indicator that can affect many
aspects of people’s lives. However, evaluating credit scores is done manually, so it costs a large amount of money and time. This paper learns
from disadvantages of previous research and brings some insights and
empirical experiments so as to the advantages of distributed solutions for
the problem of credit score in the future. The research compares some
feature engineering techniques using a big data platform and ensemble
learning methods to find the best solution for predicting the credit score.
Since data related to customers’ financial activities grows enormously, a
big data platform is necessary to handle this amount of data. In this paper,
Spark which is a distributed, data processing framework, is used to save
and process data. Some experiments are carried out to compare the effectiveness of feature engineering in this problem. Moreover, a comparative
study about the performance of ensemble learning models is also given
in this paper. A real-world Vietnamese credit scoring data set is used
to develop and evaluate models. Four metrics are used to evaluate the
performance of credit scoring models, namely F1-score, recall, precision,
and accuracy. The results are promising with the highest accuracy of
72.9% in the combination Gradient-boosted Tree and cleaned data set
with removing categorical features. This paper is a foundation for using
big data platforms to handle financial data and much future research can
be carried out to optimize the performance of this paper
ABSTRACT
The credit score is a vital indicator that can affect many
aspects of people’s lives. However, evaluating credit scores is done manually, so it costs a large amount of money and time. This paper learns
from disadvantages of previous research and brings some insights and
empirical experiments so as to the advantages of distributed solutions for
the problem of credit score in the future. The research compares some
feature engineering techniques using a big data platform and ensemble
learning methods to find the best solution for predicting the credit score.
Since data related to customers’ financial activities grows enormously, a
big data platform is necessary to handle this amount of data. In this paper,
Spark which is a distributed, data processing framework, is used to save
and process data. Some experiments are carried out to compare the effectiveness of feature engineering in this problem. Moreover, a comparative
study about the performance of ensemble learning models is also given
in this paper. A real-world Vietnamese credit scoring data set is used
to develop and evaluate models. Four metrics are used to evaluate the
performance of credit scoring models, namely F1-score, recall, precision,
and accuracy. The results are promising with the highest accuracy of
72.9% in the combination Gradient-boosted Tree and cleaned data set
with removing categorical features. This paper is a foundation for using
big data platforms to handle financial data and much future research can
be carried out to optimize the performance of this paper
© Đại học Đà Nẵng
 
 
Địa chỉ: 41 Lê Duẩn Thành phố Đà Nẵng
Điện thoại: (84) 0236 3822 041 ; Email: dhdn@ac.udn.vn