Aspect-Controlled Summarization in Indonesian Language Scientific Papers Using Longformer Encoder-Decoder

Authors

Abstract

In the current information era, the number of scientific papers published in Indonesian continues to increase, creating a need for automated document summarization systems that enable efficient and rapid access to essential information. This study investigates the application of aspect-controlled summarization to Indonesian scientific articles using the Longformer Encoder-Decoder (LED) model. The proposed system aims to generate summaries that explicitly consider predefined aspects, namely Purpose, Method, Findings, and Value.

This study employs the Indonesian-translated version of the FacetSum dataset, which has undergone preprocessing and structural reorganization to support aspect-based summarization. The LED model is fine-tuned using this dataset and evaluated at both the whole-document level and the facet level. Model performance is assessed using ROUGE metrics as the primary quantitative evaluation method, complemented by qualitative analysis.

Experimental results indicate that while the LED model is capable of generating aspect-based summaries, its performance remains limited when compared to baseline models trained on English datasets. The obtained facet-level ROUGE-L scores are 34.28 for Purpose, 19.45 for Method, 19.37 for Findings, and 21.37 for Value, whereas the BART-Facet model on the original English FacetSum dataset achieves scores of 42.55, 28.07, 28.98, and 28.70, respectively. Further analysis suggests that this performance gap is primarily attributable to architectural limitations of the LED model for aspect-based summarization tasks, as the ability to process long input sequences does not inherently translate into improved facet-level summary quality.

Published

2026-06-15