Klasifikasi Artikel Berita Bahasa Indonesia Dengan Naive Bayes Classifier

Anthony Setiawan, Leo Willyanto Santoso, Rudy Adipranata

Abstract


Human access to latest news now becoming more easier and much more, caused by advanced technological development in latest years. But, the article categorization is still manually inserted by the writer, so sometimes by human error, some mistake can be happening, like inserting wrong category or sometimes the writer purposely insert wrong category just because that category is so popular just to boost his viewer count. That’s why there is an application in the form of website to automatically categorizing the article that fit mostly to their its category.

This application is using N-Gram feature and Naïve Bayes Classifier method to classifying news content. N-Gram feature is a feature that group words based on the amount of N, like unigram or bigram. Naïve Bayes Classifier is a method that using probability to solve some problem.

According to the test using Naïve Bayes Classifier, in dataset training and test with ratio of 50 : 50, at unigram section the correct accuracy result are 0.901,  and the bigram result are 0.508. In dataset ratio of 60 : 40, at unigram section the correct accuracy result are 0.904, and the bigram result are 0.498. In dataset ratio of 70 : 30, at unigram section the correct accuracy result are 0.947, and the bigram result are 0.519. In dataset ratio of 80 : 20, at unigram section the correct accuracy result are 0.887, and the bigram result are 0.507. So, the conclusion is dataset training and test with ratio of 70 : 30 yield highest accuracy, in unigram (0.947) and also bigram (0.519).


Keywords


Naïve Bayes Classifier; N-Gram; Classification of Indonesian News Article; Web Scraping

Full Text:

PDF

References


A. S., Santoso, B. P., D. R., Wiraswari, N. M. A. K., & Sari, T. R. Klasifikasi dokumen bahasa Jawa menggunakan metode N-Gram. https://docplayer.info/37613251-Klasifikasi-dokumen-bahasa-jawa-menggunkan-metode-n-gram.html

Destuardi & Sumpeno, S. 2009. Klasifikasi emosi untuk teks bahasa Indonesia menggunakan metode Naive Bayes. http://digilib.its.ac.id/ITS-Article-91105120000039/19046

Draxl, V. (2018). Web Scraping Data Extraction from Websites. https://www.academia.edu/35901535/BACHELOR_PAPER_Web_Scraping_Data_Extraction_from_websites

Holm, J. & Gustavsson, M. 2018. XML Parser – A Comparative Study with Respect to Adaptability. http://www.diva-portal.org/smash/get/diva2:1220705/FULLTEXT01.pdf

Huang, O. 2017. Applying Multinomial Naïve Bayes to NLP Problems: A Practical Explanation. https://medium.com/syncedreview/applying-multinomial-naive-bayes-to-nlp-problems-a-practical-explanation-4f5271768ebf

Naive Bayes Classifier. 2018. http://www.statsoft.com/textbook/naive-bayes-classifier

Shaoul, C., Westbury, C. F., Baayen, R. H. 2013. The Subjective Frecuency of Word N-Grams. https://www.academia.edu/33832265/The_subjective_frequency_of_word_n-grams

Wijaya, A. P., & Santoso, H. A. 2016. Naïve Bayes Classification pada klasifikasi dokumen untuk identifikasi konten E-Government. In Journal of Applied Intelligent System. 1(1), 48-55. https://publikasi.dinus.ac.id/index.php/jais/article/view/1032/772

Yulio, A. P. 2019. Text Preprocessing dengan Python NLTK. https://devtrik.com/python/text-preprocessing-dengan-python-nltk/


Refbacks

  • There are currently no refbacks.


Jurnal telah terindeks oleh :