Ringkasan Ekstraktif Otomatis pada Berita Berbahasa Indonesia Menggunakan Metode BERT

Franky Halim; Liliana Liliana; Kartika Gunadi

Ringkasan Ekstraktif Otomatis pada Berita Berbahasa Indonesia Menggunakan Metode BERT

Franky Halim^(1*), Liliana Liliana⁽²⁾, Kartika Gunadi⁽³⁾,

(1) Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya
(2) Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya
(3) Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya
(*) Corresponding Author

Abstract

In this modern era, information has become an important part of everyday life. In getting information several things can be done where one of them is by reading. With the increasing amount of information available on the internet, it is difficult for humans to keep abreast of developments. Online news is also one of the sources of information on the internet with a very large number and various topics. Reading the whole information sometimes also takes a long time. Therefore, it is necessary to make a summary of the available online news to reduce reading time and obtain relevant information. In this research, a summary of the news will be made by selecting important sentences from the news text.
The method used in this research is Bidirectional Encoder Representations from Transformers with the addition of a transformer encoder layer.
Based on the results of the tests that have been carried out, the pre-trained indolem/indobert-base-uncased model can produce the best F1-Score 57.17 for ROUGE-1, 51.27 for ROUGE-2, and 55.20 for ROUGE-L using abstractive reference and 84.46 for ROUGE-1, 83.21 for ROUGE-2, and 83.40 for ROUGE-L using extractive reference.

Keywords

Text Summarization; Online News; Bidirectional Encoder Representations from Transformers

Full Text:

PDF

References

Al-Maleh, M., & Desouki, S. (2020). Arabic text summarization using deep learning approach. Journal of Big Data, 7(1). https://doi.org/10.1186/s40537-020-00386-7

Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1, 484–494. https://doi.org/10.18653/v1/p16-1046

Clark, K., Khandelwal, U., Levy, O., & Manning, C. D. (2019). What does BERT look at? An analysis of BERT’s attention. ArXiv. https://doi.org/10.18653/v1/w19-4828

Deng, L., & Liu, Y. (2018). Deep learning in natural language processing. Springer. https://doi.org/10.1007/978-981-10-5209-5_11

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171–4186. https://doi.org/10.18653/v1/N19-1423

El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2020). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679. https://doi.org/10.1016/j.eswa.2020.113679

Hirschberg, J., & Manning, C. D. (2015). Advances in Natural Language Processing. Science, 349(6245), 261–266. https://doi.org/10.1126/science.aaa8685

Ismi, D. P., & Ardianto, F. (2019). Peringkasan Ekstraktif Teks Bahasa Indonesia dengan Pendekatan Unsupervised Menggunakan Metode Clustering. CYBERNETICS, 3(02), 90–99. http://dx.doi.org/10.29406/cbn.v3i02.2290

Joshi, A., Fidalgo, E., Alegre, E., & Fernández-Robles, L. (2019). SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Systems with Applications, 129, 200–215. https://doi.org/10.1016/j.eswa.2019.03.045

Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. Proceedings of the 28th International Conference on Computational Linguistics, 757–770. https://doi.org/10.18653/v1/2020.coling-main.66

Kurniawan, K., & Louvan, S. (2018). Indosum: A new benchmark dataset for Indonesian text summarization. 2018 International Conference on Asian Language Processing (IALP), 215–220. https://doi.org/10.1109/IALP.2018.8629109

Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out, 74–81. doi: aclanthology.org/W04-1013

Liu, Y., & Lapata, M. (2019). Text summarization with pretrained encoders. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3730–3740. https://doi.org/10.18653/v1/d19-1387

Schmitt, J. B., Debbelt, C. A., & Schneider, F. M. (2017). Too much information? Predictors of information overload in the context of online news exposure. Information Communication and Society, 21(8), 1151–1167. https://doi.org/10.1080/1369118X.2017.1305427

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-Decem(Nips), 6000–6010. https://dl.acm.org/doi/10.5555/3295222.3295349

Refbacks

There are currently no refbacks.

Jurnal telah terindeks oleh :

Username
Password
Remember me

Jurnal Infra