Pengaruh Feature Selection terhadap Kinerja C5.0, XGBoost, dan Random Forest dalam Mengklasifikasikan Website Phishing
Keywords:
waste, waste processing, ISO 9001, 2015, quality management systemAbstract
With the increase in internet users, especially websites, it provides an opportunity for phishing actors to obtain or steal personal information from users. On each website there will be a lot of information that will be used as a feature, this feature will be used to classify phishing websites. Features will be divided into 3, namely feature url, content feature, and external feature. In this study, three methods will be used, namely C5.0, XGBoost, and Random Forest. The three methods will be tested for their performance to find the best method for classifying phishing websites. In addition, this research will also utilize feature selection with the aim of removing features that have no effect so that training time can be shortened. Based on the test results obtained, it shows that C5.0 is able to provide accuracy, precision, recall, & f1-score values with an average of 93.5%, XGBoost with an average of 96.6%, and Random Forest with an average of 95.7 %. The use of feature selection in the three algorithms also shows that training time can be shortened by an average of about 3.53 times faster by using only 15 feature importance. However, with the use of feature selection, the performance on accuracy, precision, recall, & f1- score values decreased slightly even though the given decrease was not significant or had no major impact on the classification process.References
[1] Aminu, A. A., Abdulrahman, A., Aliyu, A. Y., Aliyu, M., &
Turaki, A. M. 2019. Detection of Phishing WebsitesUsing
Random Forest and XGBOOST Algorithms. International
Journal Of Pure And Applied Sciences, 2(3), 1-11.
[2] Baykara, M., & Gurel, Z. 2018. Detection of phishing
attacks. 2018 6Th International Symposium On Digital
Forensic And Security (ISDFS).
DOI=10.1109/isdfs.2018.8355389.
[3] Berry, M. W., Mohamed, A., & Yap, B. W. (Eds.). 2015.
Soft Computing in Data Science. Communications In
Computer And Information Science, 257. DOI=10.1007/978-
981-287-936-3.
[4] Chelvan, V. P. 2022. OCBC says S$13.7 million lost in
phishing scams, up from S$8.5 million. CNA.
URI=https://www.channelnewsasia.com/singapore/ocbcphishing-scam-more-losses-victims-reported-2469086.
[5] Chen, C., Tsai, Y., Chang, F., & Lin, W. 2020. Ensemble
feature selection in medical datasets: Combining filter,
wrapper, and embedded feature selection results. Expert
Systems, 37(5). DOI=10.1111/exsy.12553
[6] Dewi, D. A. W., Cholissodin, I., & Sutrisno. 2019.
Klasifikasi Penyimpangan Tumbuh Kembang Anak
Menggunakan Algoritma C5.0. Jurnal Pengembangan
Teknologi Informasi Dan Ilmu Komputer, 3(10), 10260-
10261.
[7] Karo Karo. M., I. 2020. Implementasi Metode XGBoost dan
Feature Importance untuk Klasifikasi pada Kebakaran Hutan
dan Lahan. Journal Of Software Engineering, Information
And Communication Technology, 1(1), 12-13.
[8] Khan, N., Madhav C, N., Negi, A., & Thaseen, I. 2019.
Analysis on Improving the Performance of Machine
Learning Models Using Feature Selection Technique.
Advances In Intelligent Systems And Computing, 69-77.
DOI=10.1007/978-3-030-16660-1_7.
[9] Kumar, J., Santhanavijayan, A., Janet, B., Rajendran, B., &
Bindhumadhava, B. 2020. Phishing Website Classification
and Detection Using Machine Learning. 2020 International
Conference On Computer Communication And Informatics
(ICCCI). DOI=10.1109/iccci48352.2020.9104161.
[10] Machado, L., & Gadge, J. 2017. Phishing Sites Detection
Based on C4.5 Decision Tree Algorithm. 2017 International
Conference On Computing, Communication, Control And
Automation (ICCUBEA).
DOI=10.1109/iccubea.2017.8463818.
[11] Masurkar, S., & Dalal, V. 2020. ENHANCED MODEL FOR
DETECTION OF PHISHING URL USING MACHINE
LEARNING. Ethics And Information Technology (ETIT),
2(2), 158-163. DOI=10.26480/etit.02.2020.158.163.
[12] Shah, K., Patel, H., Sanghvi, D., & Shah, M. 2020. A
Comparative Analysis of Logistic Regression, Random
Forest and KNN Models for the Text Classification.
Augmented Human Research, 5(1), 8. DOI=10.1007/s41133-
020-00032-0.
[13] Zhang, L., & Zhan, C. 2017. Machine Learning in Rock
Facies Classification: An Application of XGBoost.
International Geophysical Conference, Qingdao, China, 17-
20 April 2017. DOI=10.1190/igc2017-35