Klasifikasi dalam Pembuatan Portal Berita Online dengan Menggunakan Metode BERT

Jehezkiel Hardwin Tandijaya(1*), Liliana Liliana(2), Indar Sugiarto(3),


(1) Program Studi Informatika
(2) Program Studi Informatika
(3) Program Studi Teknik Elektro
(*) Corresponding Author

Abstract


Internet helps human by making various information from many online news platform accessible. But nowadays, there are a lot of news that can be accessed in different online news platform and needs to be categorized. The news that can be accessed in some of the sources don’t have high credibility about an event, because the publishers use false and misleading information to push their agendas. So in order to check the credibility of an event, it is needed to also read from other sources and not only from 1 source. However, this is not effective because the reader has to look for another news source with different URL address.

In this research scraping will be done to retrieve the news that are available in a news platform. After the scraping process is done, the news will be classified to determine the category of the news. The method that will be used is Bidirectional Encoder Representations from Transformers.

From the testing of this research, the news can be retrieved and classified. The testing with a pre-trained model indobenchmark /indobert-base-p1 get a very good result where the accuracy reaches 87.548%.


Keywords


news portal; web scraping; text classification; Bidirectional Encoder Representations from Transformers

Full Text:

PDF

References


Aldwairi, M., & Alwahedi, A. 2018. Detecting fake news in

social media networks. Procedia Computer Science, 141,

–222. https://doi.org/10.1016/j.procs.2018.10.171

Ali Fauzi, M., Arifin, A. Z., Gosaria, S. C., & Prabowo, I. S.

Indonesian news classification using naïve bayes and

two-phase feature selection model. Indonesian Journal of

Electrical Engineering and Computer Science, 8(3), 610–

https://doi.org/10.11591/ijeecs.v8.i3.pp610-615

Apuke, O. D., & Omar, B. 2021. Fake news and COVID-19:

modelling the predictors of fake news sharing among social

media users. Telematics and Informatics, 56(July), 101475.

https://doi.org/10.1016/j.tele.2020.101475

Aziz, A., & Rahmah, Y. 2017. Portal system for Indonesian

online newspaper - Based feed parser simple pie.

Proceedings - 2016 International Seminar on Application of

Technology for Information and Communication,

ISEMANTIC 2016, 169–173.

https://doi.org/10.1109/ISEMANTIC.2016.7873832

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. 2019.

BERT: Pre-training of deep bidirectional transformers for

language understanding. NAACL HLT 2019 - 2019

Conference of the North American Chapter of the

Association for Computational Linguistics: Human

Language Technologies - Proceedings of the Conference,

(Mlm), 4171–4186.

Fang, W., Luo, H., Xu, S., Love, P. E. D., Lu, Z., & Ye, C.

Automated text classification of near-misses from

safety reports: An improved deep learning approach.

Advanced Engineering Informatics, 44(March 2019),

https://doi.org/10.1016/j.aei.2020.101060

HaCohen-Kerner, Y., Miller, D., & Yigal, Y. 2020. The

influence of preprocessing on text classification using a bagof-words representation. PLoS ONE, 15(5), 1–22.

https://doi.org/10.1371/journal.pone.0232525

Hartmann, J., Huppertz, J., Schamp, C., & Heitmann, M.

Comparing automated text classification methods.

International Journal of Research in Marketing, 36(1), 20–

https://doi.org/10.1016/j.ijresmar.2018.09.009

Kannan, S., Gurusamy, V., Vijayarani, S., Ilamathi, J.,

Nithya, M., Kannan, S., & Gurusamy, V. 2015.

Preprocessing Techniques for Text Mining. International

Journal of Computer Science & Communication Networks,

(1), 7–16.

Kasanah, A. N., Muladi, M., & Pujianto, U. 2019. Penerapan

Teknik SMOTE untuk Mengatasi Imbalance Class dalam

Klasifikasi Objektivitas Berita Online Menggunakan

Algoritma KNN. Jurnal RESTI (Rekayasa Sistem Dan

Teknologi Informasi), 3(2), 196–201.

https://doi.org/10.29207/resti.v3i2.945

Kwak, K. T., Hong, S. C., & Lee, S. W. 2020. A study of

repetitive news display and news consumption in Korea.

Telematics and Informatics, 46(October 2019), 101313.

https://doi.org/10.1016/j.tele.2019.101313

Mulahuwaish, A., Gyorick, K., Ghafoor, K. Z., Maghdid, H.

S., & Rawat, D. B. 2020. Efficient classification model of

web news documents using machine learning algorithms for accurate information. Computers and Security, 98.

https://doi.org/10.1016/j.cose.2020.102006

Ouatik, S., Alaoui, E., & Nahnahi, N. E. 2021. Contextual

Semantic Embeddings based on Fine-tuned AraBERT Model

for Arabic Text Multi-class Categorization. Journal of King

Saud University - Computer and Information Sciences.

https://doi.org/10.1016/j.jksuci.2021.02.005

Paul, S., & Saha, S. 2020. CyberBERT: BERT for

cyberbullying identification: BERT for cyberbullying

identification. Multimedia Systems, 0123456789.

https://doi.org/10.1007/s00530-020-00710-4

Peng, Y., Yan, S., & Lu, Z. 2019. Transfer learning in

biomedical natural language processing: An evaluation of

BERT and ELMo on ten benchmarking datasets. ArXiv, iv.

https://doi.org/10.18653/v1/w19-5006

Pramudita, Y. D., Putro, S. S., & Makhmud, N. 2018.

Klasifikasi Berita Olahraga Menggunakan Metode Naïve

Bayes dengan Enhanced Confix Stripping Stemmer. Jurnal

Teknologi Informasi Dan Ilmu Komputer, 5(3), 269.

https://doi.org/10.25126/jtiik.201853810

Sari, W. K., Rini, D. P., Malik, R. F., & Azhar, I. S. B. 2017.

Klasifikasi Teks Multilabel pada Artikel Berita

Menggunakan Long Short- Term Memory dengan

Word2Vec. 1(10), 276–285.

Sistem, R. 2021. Model Text-Preprocessing Komentar

Youtube Dalam Bahasa Indonesia. JURNAL RESTI

(Rekayasa Sistem Dan Teknologi Informasi), 1(10), 648–

Utomo, F. S., Suryana, N., & Azmi, M. S. 2020. Stemming

impact analysis on Indonesian Quran translation and their

exegesis classification for ontology instances. IIUM

Engineering Journal, 21(1), 33–50.

https://doi.org/10.31436/iiumej.v21i1.1170

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. 2017.

Attention is all you need. Advances in Neural Information

Processing Systems, 2017-Decem(Nips), 5999–6009


Refbacks

  • There are currently no refbacks.


Jurnal telah terindeks oleh :