Penerapan Machine Learning dalam mendeteksi Fake Account pada Instagram

Hendy Gunawan(1*), Yulia Yulia(2), Gregorius Satia Budhi(3),


(1) Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya
(2) Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya
(3) Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya
(*) Corresponding Author

Abstract


Instagram is the fourth most used social media in terms of the number of active users. Currently, many people are trying to increase the number of followers for other reasons such as gaining fame or wanting to be famous and trustworthy by people because they have a large number of followers. Therefore, people create fake accounts that are used to increase the number of their followers and also as a place to commit crimes such as fraud and cyberbullying. Such flexibility and spread of use has made Instagram a platform used for the proliferation of fake accounts. In this research, a website based application was designed that can detect accounts on Instagram whether they are fake or real accounts. The detection is carried out using machine learning with the Support Vector Machine, Naïve Bayes, Random Forest and Adaptive Boosting methods to detect fake or real accounts on Instagram. The method used is compared to its performance to find which method is the most appropriate in detecting fake or real accounts on Instagram. The use of k-fold cross validation is used to prevent overfitting in machine learning. Based on the tests that have been carried out, that AdaBoost can be used for account classification on Instagram with an accuracy of 92.5%, Random Forest 91.7%, Support Vector Machine 90.7% and Naïve Bayes 83.6%.

Keywords


Machine Learning; Support Vector Machine; Naïve Bayes; Random Forest; Adaptive Boosting; Instagram Account Detection

Full Text:

PDF

References


Albayati, M., & Altamimi, A. (2019). Identifying Fake

Facebook Profiles Using Data Mining Techniques. Journal

Of ICT Research And Applications, 13(2), 107-117.

https://doi.org/10.5614/itbj.ict.res.appl.2019.13.2.2

Bakhshandeh, B. (2019). Instagram fake spammer genuine

accounts. Kaggle.com. Retrieved 3 January 2022, from

https://www.kaggle.com/datasets/free4ever1/instagramfake-spammer-genuine-accounts?select=train.csv.

Berrar, D. (2019). Cross-Validation. Encyclopedia Of

Bioinformatics And Computational Biology, 1, 542-545.

https://doi.org/10.1016/b978-0-12-809633-8.20349-x

Boerman, S. (2020). The effects of the standardized

instagram disclosure for micro- and mesoinfluencers. Computers In Human Behavior, 103, 199-207.

https://doi.org/10.1016/j.chb.2019.09.015

Breiman, L. (2001). Random Forests. Machine

Learning, 45(1), 5-32.

https://doi.org/10.1023/a:1010933404324

Jiang, X., Li, Q., Ma, Z., Dong, M., Wu, J., & Guo, D. (2018).

QuickSquad: A new single-machine graph computing

framework for detecting fake accounts in large-scale social

networks. Peer-To-Peer Networking And

Applications, 12(5), 1385-1402.

https://doi.org/10.1007/s12083-018-0697-2

Hastie, T., Rosset, S., Zhu, J., & Zou, H. (2009). Multi-class

AdaBoost. Statistics And Its Interface, 2(3), 349-360.

https://doi.org/10.4310/sii.2009.v2.n3.a8

Kumar, A. (2020). The Ultimate Guide to AdaBoost

Algorithm | What is AdaBoost Algorithm?. GreatLearning

Blog: Free Resources what Matters to shape your Career!.

Retrieved 7 January 2022, from

https://www.mygreatlearning.com/blog/adaboostalgorithm/#How%20Does%20AdaBoost%20Work?.

Most used social media 2021 | Statista. Statista. (2022).

Retrieved 11 December 2021, from

https://www.statista.com/statistics/272014/global-socialnetworks-ranked-by-number-of-users/.

Narkhede, S. (2018). Understanding Confusion Matrix.

Medium. Retrieved 5 January 2022, from

https://towardsdatascience.com/understanding-confusionmatrix-a9ad42dcfd62

Pradana, G. (2021). Web Scraping Pengertian, Teknik,

Manfaat dan Kendala adalah. Ngalup Collaborative

Network. Retrieved 5 January 2022, from

https://ngalup.co/articles/pengertian-teknik-manfaatkendala-web-scraping/.

Purba, K., Asirvatham, D., & Murugesan, R. (2020).

Classification of instagram fake users using supervised

machine learning algorithms. International Journal Of

Electrical And Computer Engineering (IJECE), 10(3), 2763.

https://doi.org/10.11591/ijece.v10i3.pp2763-2772

Ramalingam, D., & Chinnaiah, V. (2018). Fake profile

detection techniques in large-scale online social networks: A

comprehensive review. Computers & Electrical

Engineering, 65, 165-177.

https://doi.org/10.1016/j.compeleceng.2017.05.020

Reddy, V. (2018). Sentiment Analysis using SVM. Medium.

Retrieved 3 January 2022, from

https://medium.com/@vasista/sentiment-analysis-usingsvm-338d418e3ff1.

Ruslidiantoro, A. (2021). Overfitting dan Underfitting.

Medium. Retrieved 31 April 2022, from

https://ariprusli.medium.com/overfitting-dan-underfitting7f9e686aa97d.

Pamungkas, R, I., & Lailiyah, N. (2019). PRESENTASI

DIRI PEMILIK DUA AKUN INSTAGRAM DI AKUN

UTAMA DAN AKUN ALTER. Interaksi Online, 7(4), 371-

Retrieved from

https://ejournal3.undip.ac.id/index.php/interaksionline/article/view/24960

Rish, I. (2001). An empirical study of the naive Bayes

classifier. In IJCAI 2001 workshop on empirical methods in

artificial intelligence (Vol. 3, No. 22, pp. 41-46).

Shaikh, S. (2021). GitHub - shaikhsajid1111/social-mediaprofile-scrapers: Fetch user's data across social media.

GitHub. Retrieved 1 March 2022, from

https://github.com/shaikhsajid1111/social-media-profilescrapers.

Sheikhi, S. (2020). An Efficient Method for Detection of

Fake Accounts on the Instagram Platform. Revue

D'intelligence Artificielle, 34(4), 429-436.

https://doi.org/10.18280/ria.340407

Shin, T. (2021). Understanding Feature Importance and

How to Implement it in Python. Medium. Retrieved 11 May

, from https://towardsdatascience.com/understandingfeature-importance-and-how-to-implement-it-in-pythonff0287b20285#:~:text=Feature%20Importance%20refers%2

to%20techniques,to%20predict%20a%20certain%20variab

le.

Sutter, B., Chiong, R., Budhi, G., & Dhakal, S. (2021).

Predicting Psychological Distress from Ecological Factors: A

Machine Learning Approach. Advances And Trends In

Artificial Intelligence. Artificial Intelligence Practices, 341-

https://doi.org/10.1007/978-3-030-79457-6_30

Twin, A. (2021). How Overfitting Works. Investopedia.

Retrieved May 14, 2021, from

https://www.investopedia.com/terms/o/overfitting.asp.

Wanda, P., Hiswati, M., Diqi, M., & Herlinda, R. (2021). ReFake: Klasifikasi Akun Palsu di Sosial Media Online

menggunakan Algoritma RNN. Prosiding Seminar Nasional

Sains Teknologi Dan Inovasi Indonesia (SENASTINDO), 3,

-200. https://doi.org/10.54706/senastindo.v3.2021.139

Yiu, T. (2019). Understanding Random Forest. Retrieved

May 16, 2021, from

https://towardsdatascience.com/understanding-randomforest-58381e0602d2.


Refbacks

  • There are currently no refbacks.


Jurnal telah terindeks oleh :