Perbandingan Analisis Faktor Penentu Penjualan PT. X Menggunakan LASSO Regression dan Gradient Boosted Regression Tree

Jessica Athalia(1*), Henry Novianus Palit(2), Silvia Rostianingsih(3),


(1) Program Studi Informatika
(2) Program Studi Informatika
(3) Program Studi Informatika
(*) Corresponding Author

Abstract


Information becomes a crucial asset for an organization. However, employees of PT. X are facing difficulty in analyzing data because it has to be processed one by one. Moreover, analyzing data in an operational database is not recommended as it can interfere with the performance of the operational database. Then, when the Board of Directors want to know the reason behind its sales’ performance, they conclude it based on their mere assumption. This research implemented a data warehouse with the help of ETL tools. Then, sales transactions of PT. X were analyzed to get information about factors that affect company’s revenue. Factor models were formed for brands which sales were not good enough these past few years. Factors which are examined are sales price, stock availability, on time delivery of goods, quantity of returns, month of transaction, and cost price. The analysis was carried with two methods, LASSO regression and Gradient Boosted Regression Tree. These models were measured by Root Mean Squared Error, R-squared, and Variance Inflation Factor to know which model performs better. Result of the research shows LASSO regression and Gradient Boosted Regression Tree succeed in performing feature selection for sales transactions of PT. X. Yet, the factor model from Gradient Boosted Regression Tree gives a better result than LASSO regression. Last, a program was made for the company in the need of future analysis using Gradient Boosted Regression Tree.

Keywords


Data warehouse; ETL; LASSO regression; Gradient Boosted Regression Tree; sales transactions

Full Text:

PDF

References


Bel, G. and Warner, M. E. 2015. Factors explaining intermunicipal cooperation in service delivery: a meta-regression

analysis. Journal of Economic Policy Reform 19, 2, 91–115.

DOI= https://doi.org/10.1080/17487870.2015.1100084.

Ferreira, J. C., De Almeida, J., and Da Silva, A. R. 2015. The

Impact of Driving Styles on Fuel Consumption: A Data-Warehouse-and-Data-Mining-Based Discovery Process.

IEEE Transactions on Intelligent Transportation Systems 16,

, 2653–2662. DOI=

https://doi.org/10.1109/TITS.2015.2414663.

Garcia, R., Diaz, G., Pañeda, X. G., Tuero, A. G., Pozueco,

L., Melendi, D., Sanchez, J. A., Corcoba, V., and Pañeda, A.

G. 2017. Impact of efficient driving in professional bus

fleets. Energies 10, 12, 1–25. DOI=

https://doi.org/10.3390/en10122060.

Gawande, S. 2015. 3 Reasons Why You Need to Perform

ETL Testing. URI= https://icedq.com/etl-testing/3-reasonswhy-you-need-to-perform-etl-testing.

Hepp, T., Schmid, M., Gefeller, O., Waldmann, E., and

Mayr, A. 2016. Approaches to regularized regression - A

comparison between gradient boosting and the lasso.

Methods of Information in Medicine 55, 5, 422–430. DOI=

https://doi.org/10.3414/ME16-01-0033.

Linstedt, D., and Olschimke, M. 2015. Building a Scalable

Data Warehouse with Data Vault 2.0. Elsevier.

McNeish, D. M. 2015. Using Lasso for Predictor Selection

and to Assuage Overfitting: A Method Long Overlooked in

Behavioral Sciences. Multivariate Behavioral Research 50,

, 471–484. DOI=

https://doi.org/10.1080/00273171.2015.1036965.

Mueller-Using, S., Feldt, T., Sarfo, F. S., and Eberhardt, K.

A. 2016. Factors associated with performing tuberculosis

screening of HIV-positive patients in Ghana: LASSO-based

predictor selection in a large public health data set. BMC

Public Health 16, 1, 1–8. DOI=

https://doi.org/10.1186/s12889-016-3239-y.

Naeem, T. 2020. Data Warehouse Concepts: Kimball vs.

Inmon Approach. URI=

https://www.astera.com/type/blog/data-warehouse-concepts/.

Persson, C., Bacher, P., Shiga, T., and Madsen, H. 2017.

Multi-site solar power forecasting using gradient boosted

regression trees. Solar Energy 150, 423– 436. DOI=

https://doi.org/10.1016/j.solener.2017.04.066.

Shin, Y. 2015. Application of boosting regression trees to

preliminary cost estimation in building construction projects.

Computational Intelligence and Neuroscience 2015, 9 pages.

DOI= https://doi.org/10.1155/2015/149702.

Smallcombe, M. 2019. The Ultimate Guide to Data

Warehouse Design. URI=

https://www.xplenty.com/blog/the-ultimate-guide-to-datawarehouse-design/.

Sreemathy, J., Priyadharshini, S., Radha, K., Sangeerna, K.,

and Nivetha, G. 2019. Data Validation in ETL Using

TALEND. 2019 5th International Conference on Advanced

Computing and Communication Systems, ICACCS 2019,

– 1186. DOI=

https://doi.org/10.1109/ICACCS.2019.8728420.

Yu, W., Zhao, C., Wu, H., and Peng, C. 2019. Analysis of

Vegetable Price Fluctuation Law and Causes based on Lasso

Regression Model. Journal of Physics: Conference Series

, 1. DOI= https://doi.org/10.1088/1742-

/1284/1/012002.


Refbacks

  • There are currently no refbacks.


Jurnal telah terindeks oleh :