Analisis Prediksi Penjualan Bisnis Retail Menggunakan Metode Decision Tree dan Random Forest
DOI:
https://doi.org/10.61132/saturnus.v4i1.1409Keywords:
Feature Engineering, High Spender Prediction, Hyperparameter Tuning, Random Forest, Retail Sales PredictionAbstract
The retail industry generates an extremely large and continuously growing volume of transactional data along with the advancement of digital technology, thereby requiring sophisticated and systematic data analysis approaches to support effective and evidence-based business decision-making. This study aims to analyze retail sales data by utilizing the Retail Sales Dataset obtained from the Kaggle platform, which consists of 100,000 transaction records and broadly represents the characteristics of retail transactions. The main focus of this study is to classify product categories and predict customer segments, including the identification of high-spending customers (high spenders), based on demographic attributes such as age and gender, as well as various transaction-related features. The research methodology includes data preprocessing, label encoding, and feature engineering to generate additional variables, including Age_Group, Is_Holiday, and Spender_Group, which are expected to enhance the predictive capability of the models. Several machine learning algorithms, namely Decision Tree, Random Forest, and XGBoost, were implemented and evaluated to compare their respective performance. The experimental results indicate that multiclass product category classification achieves relatively low accuracy, ranging from 27% to 34%. These findings suggest the high complexity of retail data and highlight the need for further model optimization, class balancing techniques, and feature refinement to improve predictive performance in future studies.
References
Aditya, M. A., Mulyana, R. D., Eka, I. P., & Widianto, S. R. (2020). Penggabungan teknologi untuk analisa data berbasis data science. Seminar Nasional Teknologi Komputer & Sains (SAINTEKS), 1(1), 51–56.
Agustina, A., Tukino, T., Huda, B., & Novalia, E. (2025). Prediksi volume penjualan gadget berdasarkan promo dan channel penjualan menggunakan random forest. JUSIFOR: Jurnal Sistem Informasi dan Informatika, 4(1), 85–91. https://doi.org/10.70609/jusifor.v4i1.6962
Apriliyani, E., & Salim, Y. (2022). Analisis performa metode klasifikasi Naive Bayes classifier pada unbalanced dataset. Indonesian Journal of Data and Science, 3(2), 47–54.
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5
Kurniawan, M. A., Syauqi, G. Z., Safriyanti, M., Azmie, F. U., & Setiawan, A. (2025). Prediksi pendapatan penjualan di Indomaret menggunakan algoritma random forest regression. JSI (Jurnal Sistem Informasi) Universitas Suryadarma, 12(2), 93–99. https://doi.org/10.35968/jsi.v12i2.1478
Kurniawan, R. D., Sukarman, D. N. D., Rumaropen, K. W., & Allo, C. B. G. (2025). Analisis komparatif algoritma decision tree dan random forest untuk klasifikasi penjualan produk pada dataset superstore. STATMAT: Jurnal Statistika dan Matematika, 7(2), 94–103. https://doi.org/10.32493/sm.v7i2.48856
Martinus, H. (2011). Analisis industri ritel nasional. Humaniora, 2(2), 1309–1321. https://doi.org/10.21512/humaniora.v2i2.3193
Nahda, Z., Rahma, A., Al Fath, L. H., & Suhairi, S. (2022). Konsep pohon keputusan. VISA: Journal of Vision and Ideas, 2(1), 135–142. https://doi.org/10.47467/visa.v2i1.961
Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217–222. https://doi.org/10.1080/01431160412331269698
Riza, N., Aulia, M. Z., Kolin, P. B., & Mustaqim, K. (2025). Analisis faktor pengaruh terhadap penghasilan profesi data engineer menggunakan metode regresi linear berganda. Jurnal Informatika dan Teknik Elektro Terapan, 13(1), 2830–7062. https://doi.org/10.23960/jitet.v13i1.5740
Selay, A., Andgha, G. D., Alfarizi, M. A., Izdhihar, M., Wahyudi, B., Falah, M. N., & Khaira, M. (2023). Sistem informasi penjualan. Karimah Tauhid, 2(1), 232–237. https://doi.org/10.30997/karimahtauhid.v2i1.7746
Soemarso, S. R. (1983). Akuntansi: Suatu pengantar. Lembaga Penerbit Fakultas Ekonomi Universitas Indonesia. https://books.google.co.id/books?id=JbZaAQAACAAJ
Verdiyanto, R., Hartanti, D., & Purwanto, E. (2025). Pengembangan aplikasi point of sales untuk prediksi penjualan harian usaha minuman menggunakan algoritma random forest regression. Infotek: Jurnal Informatika dan Teknologi, 8(1), Article 28386. https://doi.org/10.29408/jit.v8i1.28386
Warnars, S. (2009). Desain ETL dengan contoh kasus perguruan tinggi. Jurnal Informatika, 10(2), 86–93.
Yao, B. (2023). Walmart sales prediction based on decision tree, random forest, and k neighbors regressor. Highlights in Business, Economics and Management, 5, 330–335. https://doi.org/10.54097/hbem.v5i.5100
Zulfia, A., Ilfa, T. N., Damia, Z., Sukiman, T. S. A., & Karima, A. (2025). AI decision support for demand forecasting and retail stock using random forest. Brilliance: Research of Artificial Intelligence, 5(2), Article 5901. https://doi.org/10.47709/brilliance.v5i2.5901
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Saturnus: Jurnal Teknologi dan Sistem Informasi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



