Ringkasan Ekstraktif Otomatis pada Berita Berbahasa Indonesia Menggunakan Metode BERT
Keywords:
Perancangan, Media Promosi, ModeAbstract
In this modern era, information has become an important part of everyday life. In getting information several things can be done where one of them is by reading. With the increasing amount of information available on the internet, it is difficult for humans to keep abreast of developments. Online news is also one of the sources of information on the internet with a very large number and various topics. Reading the whole information sometimes also takes a long time. Therefore, it is necessary to make a summary of the available online news to reduce reading time and obtain relevant information. In this research, a summary of the news will be made by selecting important sentences from the news text.
The method used in this research is Bidirectional Encoder Representations from Transformers with the addition of a transformer encoder layer.
Based on the results of the tests that have been carried out, the pre-trained indolem/indobert-base-uncased model can produce the best F1-Score 57.17 for ROUGE-1, 51.27 for ROUGE-2, and 55.20 for ROUGE-L using abstractive reference and 84.46 for ROUGE-1, 83.21 for ROUGE-2, and 83.40 for ROUGE-L using extractive reference.
References
[1] Al-Maleh, M., & Desouki, S. (2020). Arabic text summarization using deep learning approach. Journal of Big Data, 7(1). https://doi.org/10.1186/s40537-020-00386-7
[2] Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1, 484–494. https://doi.org/10.18653/v1/p16-1046
[3] Clark, K., Khandelwal, U., Levy, O., & Manning, C. D. (2019). What does BERT look at? An analysis of BERT’s attention. ArXiv. https://doi.org/10.18653/v1/w19-4828
[4] Deng, L., & Liu, Y. (2018). Deep learning in natural language processing. Springer. https://doi.org/10.1007/978-981-10-5209-5_11
[5] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171–4186. https://doi.org/10.18653/v1/N19-1423
[6] El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2020). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679. https://doi.org/10.1016/j.eswa.2020.113679
[7] Hirschberg, J., & Manning, C. D. (2015). Advances in Natural Language Processing. Science, 349(6245), 261–266. https://doi.org/10.1126/science.aaa8685
[8] Ismi, D. P., & Ardianto, F. (2019). Peringkasan Ekstraktif Teks Bahasa Indonesia dengan Pendekatan Unsupervised Menggunakan Metode Clustering. CYBERNETICS, 3(02), 90–99. http://dx.doi.org/10.29406/cbn.v3i02.2290
[9] Joshi, A., Fidalgo, E., Alegre, E., & Fernández-Robles, L. (2019). SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Systems with Applications, 129, 200–215. https://doi.org/10.1016/j.eswa.2019.03.045
[10] Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. Proceedings of the 28th International Conference on Computational Linguistics, 757–770. https://doi.org/10.18653/v1/2020.coling-main.66
[11] Kurniawan, K., & Louvan, S. (2018). Indosum: A new benchmark dataset for Indonesian text summarization. 2018 International Conference on Asian Language Processing (IALP), 215–220. https://doi.org/10.1109/IALP.2018.8629109
[12] Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out, 74–81. doi: aclanthology.org/W04-1013
[13] Liu, Y., & Lapata, M. (2019). Text summarization with pretrained encoders. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3730–3740. https://doi.org/10.18653/v1/d19-1387
[14] Schmitt, J. B., Debbelt, C. A., & Schneider, F. M. (2017). Too much information? Predictors of information overload in the context of online news exposure. Information Communication and Society, 21(8), 1151–1167. https://doi.org/10.1080/1369118X.2017.1305427
[15] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-Decem(Nips), 6000–6010. https://dl.acm.org/doi/10.5555/3295222.3295349