Home
Giới thiệu
Tài khoản
Đăng nhập
Quên mật khẩu
Đổi mật khẩu
Đăng ký tạo tài khoản
Liệt kê
Công trình khoa học
Bài báo trong nước
Bài báo quốc tế
Sách và giáo trình
Thống kê
Công trình khoa học
Bài báo khoa học
Sách và giáo trình
Giáo sư
Phó giáo sư
Tiến sĩ
Thạc sĩ
Lĩnh vực nghiên cứu
Tìm kiếm
Cá nhân
Nội dung
Góp ý
Hiệu chỉnh lý lịch
Thông tin chung
English
Đề tài NC khoa học
Bài báo, báo cáo khoa học
Hướng dẫn Sau đại học
Sách và giáo trình
Các học phần và môn giảng dạy
Giải thưởng khoa học, Phát minh, sáng chế
Khen thưởng
Thông tin khác
Tài liệu tham khảo
Hiệu chỉnh
Số người truy cập: 107,374,607
An Investigation on Vietnamese Credit Scoring based on Big Data Platform and Ensemble Learning
Tác giả hoặc Nhóm tác giả:
Quang-Linh Tran1; Binh Van Duong, Gia-Huy Lam, Dat Vuong,
and Trong-Hop Do;
Nơi đăng:
The First International Conference on Intelligence of Things;
S
ố:
ISBN 978-3-031-15062-3;
Từ->đến trang
: 289–298;
Năm:
2022
Lĩnh vực:
Công nghệ thông tin;
Loại:
Bài báo khoa học;
Thể loại:
Quốc tế
TÓM TẮT
The credit score is a vital indicator that can affect many
aspects of people’s lives. However, evaluating credit scores is done manually, so it costs a large amount of money and time. This paper learns
from disadvantages of previous research and brings some insights and
empirical experiments so as to the advantages of distributed solutions for
the problem of credit score in the future. The research compares some
feature engineering techniques using a big data platform and ensemble
learning methods to find the best solution for predicting the credit score.
Since data related to customers’ financial activities grows enormously, a
big data platform is necessary to handle this amount of data. In this paper,
Spark which is a distributed, data processing framework, is used to save
and process data. Some experiments are carried out to compare the effectiveness of feature engineering in this problem. Moreover, a comparative
study about the performance of ensemble learning models is also given
in this paper. A real-world Vietnamese credit scoring data set is used
to develop and evaluate models. Four metrics are used to evaluate the
performance of credit scoring models, namely F1-score, recall, precision,
and accuracy. The results are promising with the highest accuracy of
72.9% in the combination Gradient-boosted Tree and cleaned data set
with removing categorical features. This paper is a foundation for using
big data platforms to handle financial data and much future research can
be carried out to optimize the performance of this paper
ABSTRACT
The credit score is a vital indicator that can affect many
aspects of people’s lives. However, evaluating credit scores is done manually, so it costs a large amount of money and time. This paper learns
from disadvantages of previous research and brings some insights and
empirical experiments so as to the advantages of distributed solutions for
the problem of credit score in the future. The research compares some
feature engineering techniques using a big data platform and ensemble
learning methods to find the best solution for predicting the credit score.
Since data related to customers’ financial activities grows enormously, a
big data platform is necessary to handle this amount of data. In this paper,
Spark which is a distributed, data processing framework, is used to save
and process data. Some experiments are carried out to compare the effectiveness of feature engineering in this problem. Moreover, a comparative
study about the performance of ensemble learning models is also given
in this paper. A real-world Vietnamese credit scoring data set is used
to develop and evaluate models. Four metrics are used to evaluate the
performance of credit scoring models, namely F1-score, recall, precision,
and accuracy. The results are promising with the highest accuracy of
72.9% in the combination Gradient-boosted Tree and cleaned data set
with removing categorical features. This paper is a foundation for using
big data platforms to handle financial data and much future research can
be carried out to optimize the performance of this paper
© Đại học Đà Nẵng
Địa chỉ: 41 Lê Duẩn Thành phố Đà Nẵng
Điện thoại: (84) 0236 3822 041 ; Email: dhdn@ac.udn.vn