Analisis Prediksi Penjualan Bisnis Retail Menggunakan Metode Decision Tree dan Random Forest

Authors

  • Agung Narayana Adhi Putra Institut Bisnis dan Teknologi Indonesia
  • I Wayan Sudiarsa Institut Bisnis dan Teknologi Indonesia
  • I Kadek Adi Gunawan Institut Bisnis dan Teknologi Indonesia
  • Kadek Bagus Karunia Dwi Dharmayasa Institut Bisnis dan Teknologi Indonesia
  • I Wayan Eka Saputra Institut Bisnis dan Teknologi Indonesia

DOI:

https://doi.org/10.61132/saturnus.v4i1.1409

Keywords:

Feature Engineering, High Spender Prediction, Hyperparameter Tuning, Random Forest, Retail Sales Prediction

Abstract

The retail industry generates an extremely large and continuously growing volume of transactional data along with the advancement of digital technology, thereby requiring sophisticated and systematic data analysis approaches to support effective and evidence-based business decision-making. This study aims to analyze retail sales data by utilizing the Retail Sales Dataset obtained from the Kaggle platform, which consists of 100,000 transaction records and broadly represents the characteristics of retail transactions. The main focus of this study is to classify product categories and predict customer segments, including the identification of high-spending customers (high spenders), based on demographic attributes such as age and gender, as well as various transaction-related features. The research methodology includes data preprocessing, label encoding, and feature engineering to generate additional variables, including Age_Group, Is_Holiday, and Spender_Group, which are expected to enhance the predictive capability of the models. Several machine learning algorithms, namely Decision Tree, Random Forest, and XGBoost, were implemented and evaluated to compare their respective performance. The experimental results indicate that multiclass product category classification achieves relatively low accuracy, ranging from 27% to 34%. These findings suggest the high complexity of retail data and highlight the need for further model optimization, class balancing techniques, and feature refinement to improve predictive performance in future studies.

References

Aditya, M. A., Mulyana, R. D., Eka, I. P., & Widianto, S. R. (2020). Penggabungan teknologi untuk analisa data berbasis data science. Seminar Nasional Teknologi Komputer & Sains (SAINTEKS), 1(1), 51–56.

Agustina, A., Tukino, T., Huda, B., & Novalia, E. (2025). Prediksi volume penjualan gadget berdasarkan promo dan channel penjualan menggunakan random forest. JUSIFOR: Jurnal Sistem Informasi dan Informatika, 4(1), 85–91. https://doi.org/10.70609/jusifor.v4i1.6962

Apriliyani, E., & Salim, Y. (2022). Analisis performa metode klasifikasi Naive Bayes classifier pada unbalanced dataset. Indonesian Journal of Data and Science, 3(2), 47–54.

Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5

Kurniawan, M. A., Syauqi, G. Z., Safriyanti, M., Azmie, F. U., & Setiawan, A. (2025). Prediksi pendapatan penjualan di Indomaret menggunakan algoritma random forest regression. JSI (Jurnal Sistem Informasi) Universitas Suryadarma, 12(2), 93–99. https://doi.org/10.35968/jsi.v12i2.1478

Kurniawan, R. D., Sukarman, D. N. D., Rumaropen, K. W., & Allo, C. B. G. (2025). Analisis komparatif algoritma decision tree dan random forest untuk klasifikasi penjualan produk pada dataset superstore. STATMAT: Jurnal Statistika dan Matematika, 7(2), 94–103. https://doi.org/10.32493/sm.v7i2.48856

Martinus, H. (2011). Analisis industri ritel nasional. Humaniora, 2(2), 1309–1321. https://doi.org/10.21512/humaniora.v2i2.3193

Nahda, Z., Rahma, A., Al Fath, L. H., & Suhairi, S. (2022). Konsep pohon keputusan. VISA: Journal of Vision and Ideas, 2(1), 135–142. https://doi.org/10.47467/visa.v2i1.961

Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217–222. https://doi.org/10.1080/01431160412331269698

Riza, N., Aulia, M. Z., Kolin, P. B., & Mustaqim, K. (2025). Analisis faktor pengaruh terhadap penghasilan profesi data engineer menggunakan metode regresi linear berganda. Jurnal Informatika dan Teknik Elektro Terapan, 13(1), 2830–7062. https://doi.org/10.23960/jitet.v13i1.5740

Selay, A., Andgha, G. D., Alfarizi, M. A., Izdhihar, M., Wahyudi, B., Falah, M. N., & Khaira, M. (2023). Sistem informasi penjualan. Karimah Tauhid, 2(1), 232–237. https://doi.org/10.30997/karimahtauhid.v2i1.7746

Soemarso, S. R. (1983). Akuntansi: Suatu pengantar. Lembaga Penerbit Fakultas Ekonomi Universitas Indonesia. https://books.google.co.id/books?id=JbZaAQAACAAJ

Verdiyanto, R., Hartanti, D., & Purwanto, E. (2025). Pengembangan aplikasi point of sales untuk prediksi penjualan harian usaha minuman menggunakan algoritma random forest regression. Infotek: Jurnal Informatika dan Teknologi, 8(1), Article 28386. https://doi.org/10.29408/jit.v8i1.28386

Warnars, S. (2009). Desain ETL dengan contoh kasus perguruan tinggi. Jurnal Informatika, 10(2), 86–93.

Yao, B. (2023). Walmart sales prediction based on decision tree, random forest, and k neighbors regressor. Highlights in Business, Economics and Management, 5, 330–335. https://doi.org/10.54097/hbem.v5i.5100

Zulfia, A., Ilfa, T. N., Damia, Z., Sukiman, T. S. A., & Karima, A. (2025). AI decision support for demand forecasting and retail stock using random forest. Brilliance: Research of Artificial Intelligence, 5(2), Article 5901. https://doi.org/10.47709/brilliance.v5i2.5901

Downloads

Published

2026-01-28

How to Cite

Agung Narayana Adhi Putra, I Wayan Sudiarsa, I Kadek Adi Gunawan, Kadek Bagus Karunia Dwi Dharmayasa, & I Wayan Eka Saputra. (2026). Analisis Prediksi Penjualan Bisnis Retail Menggunakan Metode Decision Tree dan Random Forest. Saturnus: Jurnal Teknologi Dan Sistem Informasi, 4(1), 94–102. https://doi.org/10.61132/saturnus.v4i1.1409

Similar Articles

1 2 3 4 > >> 

You may also start an advanced similarity search for this article.