Pengembangan Chrome Extension untuk Mengidentifikasi Phishing Website berdasarkan URL dengan Algoritma Random Forest
Abstract
The ever developing technology makes internet one of the most important part in human’s daily activity. This development is also followed by the increase of phishing activity which is not only in quantity, but also in the variety of techniques. The loss caused by phishing attacks is quite big. There are a lot of applications for preventing phishing attacks, but most of them are still not accurate enough. Several studies show that ensemble learning algorithm has a good capability in detecting phishing website.In this research a chrome extension which uses a Random Forest model to detect phishing websites has been developed. Random Forest is one of the most well-known ensemble learning algorithm. The most important hyperparameters which would be experimented with are n_estimators, min_samples_leaf, min_samples_split, max_features, and max_depth. Features used are Lexical features which are based on references from other researches, and Domain-based features which are the newly proposed ones, comprised of Global Page Rank, Average Daily Time, Sites Linking In, Domain Age, and Registration Period. All features are obtained only from the URL.
This research shows that dataset quality is the most impacting factor in making a good model. Hyperparameter tuning is also an important part but is only limited to certain scenario. The newly proposed features could make an improvement to the model’s performance. From several experiments, the usage of Lexical and Domain-based features has successfully achieved the best accuracy of 98.28%.
References
[1] Aalto, M. 2018. Statistics Showing 5 Phishing Trends for 2019 (with Infographic). Dipetik May 15, 2019, dari hoxhunt.com: https://www.hoxhunt.com/blog/statistics-showing-5-phishing-trends-for-2019
[2] Altaher, A. 2017. Phishing Websites Classification using Hybrid SVM and KNN Approach. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 6, 2017, 90-95.
[3] Ayres, L. D., Brito, I. V., & Souza, R. R. 2019. Using Machine Learning to Automatically Detect Malicious URLs in Brazil. Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC 2019), 972-985.
[4] Donges, N. 2019, June 16. A Complete Guide to the Random Forest Algorithm. Dipetik November 20, 2019, dari builtin.com: https://builtin.com/data-science/random-forest-algorithm
[5] McAfee. 2013, July 3. What is Typosquatting? Dipetik November 18, 2019, dari mcafee.com: https://www.mcafee.com/blogs/consumer/consumer/family-safety/what-is-typosquatting/
[6] Mohammad, R. M., Thabtah, F., & McCluskey, L. 2015. Tutorial and critical analysis of phishing websites. Computer Science Review, 1-24.
[7] Neuhaus, R., & Ruvinskiy, R. 2015, August 25. Gibberish Detector. Dipetik May 5, 2020, dari github.com: https://github.com/rrenaud/Gibberish-Detector
[8] Ramchandani, P. 2018, October 10. Random Forests and the Bias-Variance Tradeoff. Dipetik September 5, 2020, dari towardsdatascience.com: https://towardsdatascience.com/random-forests-and-the-bias-variance-tradeoff-3b77fee339b4
[9] Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. 2018. Machine learning based phishing detection dari URLs. Expert Systems With Applications, 345-357.
[10] Spadafora, A. 2018, November 7. Phishing attacks see major rise. Dipetik May 15, 2019, dari techradar.com: https://www.techradar.com/news/phishing-attacks-see-major-rise
[11] Ubing, A. A., Jasmi, S. K., Abdullah, A., Jhanjhi, N., & Supramaniam, M. 2019. Phishing Website Detection: An Improved Accuracy through Feature Selection and Ensemble Learning. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 10 No. 1, 2019, 252-257.
[12] W3Counter. 2019, October 31. Browser & Platform Market Share - October 2019. Dipetik November 21, 2019, dari w3counter.com: https://www.w3counter.com/globalstats.php?year=2019&month=10
[13] Yiu, T. 2019, June 12. Understanding Random Forest: How the Algorithm Works and Why it Is So Effective. Dipetik November 24, 2019, dari towardsdatascience.com: https://towardsdatascience.com/