Prediksi Peringkat Mingguan Lagu Pada Spotify Amerika Serikat Menggunakan Multiple Charts Dataset Dengan Berbagai Metode

Christianto Imanuel Aryanto, Henry Novianus Palit, Andre Gunawan


In 2020, the majority of the music industry's revenue, 62.1%, came from streaming music. As a result, many music business parties are striving for a hit song, particularly on Spotify US chart. However, this is difficult to achieve because nowadays, a song's performance is determined by its performance on various music charts, not by its quality. Due to that, a study in the field of hit song science will be conducted to forecast weekly song ranking on Spotify US using data from Spotify, Shazam, Airplay, and TikTok charts. Multipler linear regression, polynomial regression, gradient boosting tree, and random forest are the methods used in this study to create models, and each model will be compared using adjusted r-squared and mean absolute error (MAE) as evaluation metrics. Random forest produced the best model, with adjusted r-squared and MAE values of 93.133% and 11.687, respectively. The usage of music attribute had a negative impact on model performance. Shazam chart, on the other hand, has been shown to have a positive impact on model performance. Meanwhile, neither the Airplay nor the TikTok charts have a definite positive or negative impact. However, both have been shown to have a very weak relation with model performance. Overall, the dataset combination of Spotify, Shazam, Airplay, and TikTok chart produced the best model in this study.


Spotify; hit song science; ranking prediction; multiple linear regression; polynomial regression; gradient boosting tree; random forest

Full Text:



Araujo, C.V.S., Cristo, M.A.P, & Giusti, R. 2019. Predicting

music popularity on streaming platforms. Revista de

Informatica Teorica e Aplicada - RITA, 27(4), 108-117.


Brown, I. & Mues, C. 2012. An experimental comparison of

classification algorithms for imbalanced credit scoring data

sets. Expert Systems with Applications, 39(3), 3446-3453.


Chai, T. & Draxler, R. R. 2014. Root mean square error or

mean absolute error. The journal geoscientific model

development, 7, 1525-1534. DOI=10.5194/gmdd-7-1525-

Chartmetric. Chartmetric api documentation.


Georgieva, E., Suta, M., & Burton, N. 2018. Hitpredict:

Predicting hit songs using spotify data. Stanford University.


Gotting, M. C. 2021, June 29. Leading radio formats in the

United States as of June 2021, by number of stations. Statista.


Gotting, M. C. 2022, May 10. Distribution of streamed music

consumption in the United States in 2021, by genre. Statista.


International Federation of the Phonographic Industry. 2021.

IFPI issues global music report 2021.


Karch, J. 2019. Improving on adjusted r-squared.


Karydis, I., Gkiokas, A., Katsouros, V., & Iliadis, L. 2018.

Musical track popularity mining Dataset: Extension &

experimentation. Neurocomputing, 280(1), 76-85.


Miles, J. 2014. R Squared, Adjusted R Squared. Wiley

StatsRef: Statistics Reference Online.


Montgomery, D. C., Peck, E. A., & Vining, G. G. 1982.

Introduction to linear regression analysis.


Rawlings, J. O., Dickey, D. A., &

Pantula, S. G. 2006. Applied regression analysis: A research

tool (2nd ed.). Germany: Springer New York.

Rencher, A. C. & Schaalje, G. B. 2008. Linear models in

statistics (2nd ed.). Germany: Wiley.

Singhi, A. & Brown, D. G. 2014. Hit song detection using

lyric features alone. University of Waterloo.


Spotify. Spotify for developers documentation.


Spotify Technology S.A. 2021. 2020 annual report.


Suryanto, A. A. & Muqtadir, A. 2019. Penerapan metode

mean absolute error (MAE) dalam algoritma regresi linear

untuk prediksi produksi padi. Saintekbu: Jurnal Sains dan

Teknologi, 11(1), 78-83. DOI=10.32764/saintekbu.v11i1.298


  • There are currently no refbacks.

Jurnal telah terindeks oleh :