Klasifikasi dalam Pembuatan Portal Berita Online dengan Menggunakan Metode BERT
(1) Program Studi Informatika
(2) Program Studi Informatika
(3) Program Studi Teknik Elektro
(*) Corresponding Author
Abstract
Internet helps human by making various information from many online news platform accessible. But nowadays, there are a lot of news that can be accessed in different online news platform and needs to be categorized. The news that can be accessed in some of the sources don’t have high credibility about an event, because the publishers use false and misleading information to push their agendas. So in order to check the credibility of an event, it is needed to also read from other sources and not only from 1 source. However, this is not effective because the reader has to look for another news source with different URL address.
In this research scraping will be done to retrieve the news that are available in a news platform. After the scraping process is done, the news will be classified to determine the category of the news. The method that will be used is Bidirectional Encoder Representations from Transformers.
From the testing of this research, the news can be retrieved and classified. The testing with a pre-trained model indobenchmark /indobert-base-p1 get a very good result where the accuracy reaches 87.548%.
Keywords
Full Text:
PDFReferences
Aldwairi, M., & Alwahedi, A. 2018. Detecting fake news in
social media networks. Procedia Computer Science, 141,
–222. https://doi.org/10.1016/j.procs.2018.10.171
Ali Fauzi, M., Arifin, A. Z., Gosaria, S. C., & Prabowo, I. S.
Indonesian news classification using naïve bayes and
two-phase feature selection model. Indonesian Journal of
Electrical Engineering and Computer Science, 8(3), 610–
https://doi.org/10.11591/ijeecs.v8.i3.pp610-615
Apuke, O. D., & Omar, B. 2021. Fake news and COVID-19:
modelling the predictors of fake news sharing among social
media users. Telematics and Informatics, 56(July), 101475.
https://doi.org/10.1016/j.tele.2020.101475
Aziz, A., & Rahmah, Y. 2017. Portal system for Indonesian
online newspaper - Based feed parser simple pie.
Proceedings - 2016 International Seminar on Application of
Technology for Information and Communication,
ISEMANTIC 2016, 169–173.
https://doi.org/10.1109/ISEMANTIC.2016.7873832
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. 2019.
BERT: Pre-training of deep bidirectional transformers for
language understanding. NAACL HLT 2019 - 2019
Conference of the North American Chapter of the
Association for Computational Linguistics: Human
Language Technologies - Proceedings of the Conference,
(Mlm), 4171–4186.
Fang, W., Luo, H., Xu, S., Love, P. E. D., Lu, Z., & Ye, C.
Automated text classification of near-misses from
safety reports: An improved deep learning approach.
Advanced Engineering Informatics, 44(March 2019),
https://doi.org/10.1016/j.aei.2020.101060
HaCohen-Kerner, Y., Miller, D., & Yigal, Y. 2020. The
influence of preprocessing on text classification using a bagof-words representation. PLoS ONE, 15(5), 1–22.
https://doi.org/10.1371/journal.pone.0232525
Hartmann, J., Huppertz, J., Schamp, C., & Heitmann, M.
Comparing automated text classification methods.
International Journal of Research in Marketing, 36(1), 20–
https://doi.org/10.1016/j.ijresmar.2018.09.009
Kannan, S., Gurusamy, V., Vijayarani, S., Ilamathi, J.,
Nithya, M., Kannan, S., & Gurusamy, V. 2015.
Preprocessing Techniques for Text Mining. International
Journal of Computer Science & Communication Networks,
(1), 7–16.
Kasanah, A. N., Muladi, M., & Pujianto, U. 2019. Penerapan
Teknik SMOTE untuk Mengatasi Imbalance Class dalam
Klasifikasi Objektivitas Berita Online Menggunakan
Algoritma KNN. Jurnal RESTI (Rekayasa Sistem Dan
Teknologi Informasi), 3(2), 196–201.
https://doi.org/10.29207/resti.v3i2.945
Kwak, K. T., Hong, S. C., & Lee, S. W. 2020. A study of
repetitive news display and news consumption in Korea.
Telematics and Informatics, 46(October 2019), 101313.
https://doi.org/10.1016/j.tele.2019.101313
Mulahuwaish, A., Gyorick, K., Ghafoor, K. Z., Maghdid, H.
S., & Rawat, D. B. 2020. Efficient classification model of
web news documents using machine learning algorithms for accurate information. Computers and Security, 98.
https://doi.org/10.1016/j.cose.2020.102006
Ouatik, S., Alaoui, E., & Nahnahi, N. E. 2021. Contextual
Semantic Embeddings based on Fine-tuned AraBERT Model
for Arabic Text Multi-class Categorization. Journal of King
Saud University - Computer and Information Sciences.
https://doi.org/10.1016/j.jksuci.2021.02.005
Paul, S., & Saha, S. 2020. CyberBERT: BERT for
cyberbullying identification: BERT for cyberbullying
identification. Multimedia Systems, 0123456789.
https://doi.org/10.1007/s00530-020-00710-4
Peng, Y., Yan, S., & Lu, Z. 2019. Transfer learning in
biomedical natural language processing: An evaluation of
BERT and ELMo on ten benchmarking datasets. ArXiv, iv.
https://doi.org/10.18653/v1/w19-5006
Pramudita, Y. D., Putro, S. S., & Makhmud, N. 2018.
Klasifikasi Berita Olahraga Menggunakan Metode Naïve
Bayes dengan Enhanced Confix Stripping Stemmer. Jurnal
Teknologi Informasi Dan Ilmu Komputer, 5(3), 269.
https://doi.org/10.25126/jtiik.201853810
Sari, W. K., Rini, D. P., Malik, R. F., & Azhar, I. S. B. 2017.
Klasifikasi Teks Multilabel pada Artikel Berita
Menggunakan Long Short- Term Memory dengan
Word2Vec. 1(10), 276–285.
Sistem, R. 2021. Model Text-Preprocessing Komentar
Youtube Dalam Bahasa Indonesia. JURNAL RESTI
(Rekayasa Sistem Dan Teknologi Informasi), 1(10), 648–
Utomo, F. S., Suryana, N., & Azmi, M. S. 2020. Stemming
impact analysis on Indonesian Quran translation and their
exegesis classification for ontology instances. IIUM
Engineering Journal, 21(1), 33–50.
https://doi.org/10.31436/iiumej.v21i1.1170
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. 2017.
Attention is all you need. Advances in Neural Information
Processing Systems, 2017-Decem(Nips), 5999–6009
Refbacks
- There are currently no refbacks.
Jurnal telah terindeks oleh :