Ringkasan Ekstraktif Otomatis pada Berita Berbahasa Indonesia Menggunakan Metode BERT
(1) Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya
(2) Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya
(3) Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya
(*) Corresponding Author
Abstract
In this modern era, information has become an important part of everyday life. In getting information several things can be done where one of them is by reading. With the increasing amount of information available on the internet, it is difficult for humans to keep abreast of developments. Online news is also one of the sources of information on the internet with a very large number and various topics. Reading the whole information sometimes also takes a long time. Therefore, it is necessary to make a summary of the available online news to reduce reading time and obtain relevant information. In this research, a summary of the news will be made by selecting important sentences from the news text.
The method used in this research is Bidirectional Encoder Representations from Transformers with the addition of a transformer encoder layer.
Based on the results of the tests that have been carried out, the pre-trained indolem/indobert-base-uncased model can produce the best F1-Score 57.17 for ROUGE-1, 51.27 for ROUGE-2, and 55.20 for ROUGE-L using abstractive reference and 84.46 for ROUGE-1, 83.21 for ROUGE-2, and 83.40 for ROUGE-L using extractive reference.
Keywords
Full Text:
PDFReferences
Al-Maleh, M., & Desouki, S. (2020). Arabic text summarization using deep learning approach. Journal of Big Data, 7(1). https://doi.org/10.1186/s40537-020-00386-7
Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1, 484–494. https://doi.org/10.18653/v1/p16-1046
Clark, K., Khandelwal, U., Levy, O., & Manning, C. D. (2019). What does BERT look at? An analysis of BERT’s attention. ArXiv. https://doi.org/10.18653/v1/w19-4828
Deng, L., & Liu, Y. (2018). Deep learning in natural language processing. Springer. https://doi.org/10.1007/978-981-10-5209-5_11
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171–4186. https://doi.org/10.18653/v1/N19-1423
El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2020). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679. https://doi.org/10.1016/j.eswa.2020.113679
Hirschberg, J., & Manning, C. D. (2015). Advances in Natural Language Processing. Science, 349(6245), 261–266. https://doi.org/10.1126/science.aaa8685
Ismi, D. P., & Ardianto, F. (2019). Peringkasan Ekstraktif Teks Bahasa Indonesia dengan Pendekatan Unsupervised Menggunakan Metode Clustering. CYBERNETICS, 3(02), 90–99. http://dx.doi.org/10.29406/cbn.v3i02.2290
Joshi, A., Fidalgo, E., Alegre, E., & Fernández-Robles, L. (2019). SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Systems with Applications, 129, 200–215. https://doi.org/10.1016/j.eswa.2019.03.045
Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. Proceedings of the 28th International Conference on Computational Linguistics, 757–770. https://doi.org/10.18653/v1/2020.coling-main.66
Kurniawan, K., & Louvan, S. (2018). Indosum: A new benchmark dataset for Indonesian text summarization. 2018 International Conference on Asian Language Processing (IALP), 215–220. https://doi.org/10.1109/IALP.2018.8629109
Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out, 74–81. doi: aclanthology.org/W04-1013
Liu, Y., & Lapata, M. (2019). Text summarization with pretrained encoders. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3730–3740. https://doi.org/10.18653/v1/d19-1387
Schmitt, J. B., Debbelt, C. A., & Schneider, F. M. (2017). Too much information? Predictors of information overload in the context of online news exposure. Information Communication and Society, 21(8), 1151–1167. https://doi.org/10.1080/1369118X.2017.1305427
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-Decem(Nips), 6000–6010. https://dl.acm.org/doi/10.5555/3295222.3295349
Refbacks
- There are currently no refbacks.
Jurnal telah terindeks oleh :