Analisis Perbandingan Performa Model ConvLSTM dan LRCN dalam Pengenalan Aktivitas Gerak Manusia
DOI:
https://doi.org/10.61132/neptunus.v3i3.991Keywords:
ConvLSTM, Deep Learning, Human Activity Recognition, LRCNAbstract
This study compares the performance of two deep learning models, namely Convolutional Long Short-Term Memory (ConvLSTM) and Long-term Recurrent Convolutional Network (LRCN), in the task of recognizing human activity from videos. Human activity recognition is an important field in computer vision with many applications, such as security monitoring, human-computer interaction, and social media-based video analysis. ConvLSTM is a model that combines convolution operations with long-term memory LSTM, thus capable of capturing spatial and temporal information simultaneously. This approach is ideal for processing video data sequences that have spatial and temporal dimensions. On the other hand, LRCN combines the power of spatial feature extraction from Convolutional Neural Network (CNN) and temporal sequence modeling through Recurrent Neural Network (RNN), specifically LSTM, to understand movement patterns in videos. The study used the UCF50 dataset consisting of 50 activity classes, but was limited to five classes for the focus of the experiment. The dataset was divided into 80% for training and 20% for testing, and the model was drilled for 50 epochs using early stopping to prevent overfitting. The results show that both models have high training performance. ConvLSTM achieved a training accuracy of around 98% and a validation accuracy of 90%, while LRCN achieved a training accuracy of 99.5% and a validation accuracy of 88%. Although ConvLSTM demonstrated good stability on the validation data, further testing using TikTok videos as real-world data showed that LRCN had a higher confidence level in recognizing activities, with most predictions achieving confidence scores above 80%. This difference in performance indicates that while ConvLSTM excels in generalizing on training data, LRCN is more robust to real-world data variations.
References
Ahmad, I., Ullah, F., Khan, M. A., & Kim, D. (2021). Human action recognition in smart surveillance using hybrid deep learning model. Sensors, 21(18), 6104. https://doi.org/10.3390/s21186104
Almars, A. M. (2021). Deepfakes detection techniques using deep learning: A survey. Journal of Computer and Communications, 9(5), 20–35. https://doi.org/10.4236/jcc.2021.95003
Arshad, M. H., Bilal, M., & Gani, A. (2022). Human activity recognition: Review, taxonomy and open challenges. Sensors, 22(17), 1–33. https://doi.org/10.3390/s22176463
Elmetwally, A., Eldeeb, R., & Elmougy, S. (2025). Deep learning based anomaly detection in real-time video. Multimedia Tools and Applications, 84(11), 9555–9571. https://doi.org/10.1007/s11042-024-19116-9
Gupta, S. (2021). Deep learning based human activity recognition (HAR) using wearable sensor data. International Journal of Information Management Data Insights, 1(2), 100046. https://doi.org/10.1016/j.jjimei.2021.100046
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A. K., & Davis, L. S. (2016). Learning temporal regularity in video sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 733–742). https://doi.org/10.1109/CVPR.2016.85
Ismail, A. P., Aziz, F. A. A., Kasim, N. M., & Daud, K. (2021). Hand gesture recognition on Python and OpenCV. IOP Conference Series: Materials Science and Engineering, 1045(1), 012043. https://doi.org/10.1088/1757-899x/1045/1/012043
Komatsu, M., Sakai, A., Komatsu, R., Matsuoka, R., Yasutomi, S., Shozu, K., … Hamamoto, R. (2021). Detection of cardiac structural abnormalities in fetal ultrasound videos using deep learning. Applied Sciences (Switzerland), 11(1), 1–12. https://doi.org/10.3390/app11010371
Lina, Augustine, M., Stephen, R., & Salim, L. (2024). Pengenalan aktivitas manusia dalam ruangan dengan convolutional neural networks. Teknika, 13(1), 58–64. https://doi.org/10.34148/teknika.v13i1.707
Mansour, R. F., Escorcia-Gutierrez, J., Gamarra, M., Villanueva, J. A., & Leal, N. (2021). Intelligent video anomaly detection and classification using Faster R-CNN with deep reinforcement learning model. Image and Vision Computing, 112, 104229. https://doi.org/10.1016/j.imavis.2021.104229
Rahman, A., Islam, M., Moon, M. J., Tasnim, T., & Siddique, N. (2022). A qualitative survey on deep learning-based deepfake video creation and detection method. Australian Journal of Engineering and Innovative Technology, 13–26. https://doi.org/10.34104/ajeit.022.013026
Ravanbakhsh, M., Nabi, M., Mousavi, H. S., & Sebe, N. (2017). Abnormal event detection in videos using generative adversarial nets. In 2017 IEEE International Conference on Image Processing (ICIP) (pp. 1577–1581). https://doi.org/10.1109/ICIP.2017.8296547
Sultani, W., Chen, C., & Shah, M. (2018). Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6479–6488). https://doi.org/10.1109/CVPR.2018.00678
Uddin, M. A., Talukder, M. A., Uzzaman, M. S., Debnath, C., Chanda, M., Paul, S., … Aryal, S. (2024). Deep learning-based human activity recognition using CNN, ConvLSTM, and LRCN. International Journal of Cognitive Computing in Engineering, 5, 259–268. https://doi.org/10.1016/j.ijcce.2024.06.004
Vrskova, R., Hudec, R., Kamencay, P., & Sykora, P. (2022). A new approach for abnormal human activities recognition based on ConvLSTM architecture. Sensors, 22(8), 1–20. https://doi.org/10.3390/s22082946
Wu, P., Pan, C., Yan, Y., Pang, G., Wang, P., & Zhang, Y. (2024). Deep learning for video anomaly detection: A review. arXiv. http://arxiv.org/abs/2409.05383
Zhou, T., Porikli, F., Crandall, D. J., Van Gool, L., & Wang, W. (2023). A survey on deep learning technique for video segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 7099–7122. https://doi.org/10.1109/TPAMI.2022.3225573
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Neptunus: Jurnal Ilmu Komputer Dan Teknologi Informasi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



