Analisis Perbandingan Performa Model ConvLSTM dan LRCN dalam Pengenalan Aktivitas Gerak Manusia

Amir Hamzah; Jamilatul Badriyah

doi:10.61132/neptunus.v3i3.991

Authors

Amir Hamzah Universitas Madura
Jamilatul Badriyah Universitas Madura

DOI:

https://doi.org/10.61132/neptunus.v3i3.991

Keywords:

ConvLSTM, Deep Learning, Human Activity Recognition, LRCN

Abstract

This study compares the performance of two deep learning models, namely Convolutional Long Short-Term Memory (ConvLSTM) and Long-term Recurrent Convolutional Network (LRCN), in the task of recognizing human activity from videos. Human activity recognition is an important field in computer vision with many applications, such as security monitoring, human-computer interaction, and social media-based video analysis. ConvLSTM is a model that combines convolution operations with long-term memory LSTM, thus capable of capturing spatial and temporal information simultaneously. This approach is ideal for processing video data sequences that have spatial and temporal dimensions. On the other hand, LRCN combines the power of spatial feature extraction from Convolutional Neural Network (CNN) and temporal sequence modeling through Recurrent Neural Network (RNN), specifically LSTM, to understand movement patterns in videos. The study used the UCF50 dataset consisting of 50 activity classes, but was limited to five classes for the focus of the experiment. The dataset was divided into 80% for training and 20% for testing, and the model was drilled for 50 epochs using early stopping to prevent overfitting. The results show that both models have high training performance. ConvLSTM achieved a training accuracy of around 98% and a validation accuracy of 90%, while LRCN achieved a training accuracy of 99.5% and a validation accuracy of 88%. Although ConvLSTM demonstrated good stability on the validation data, further testing using TikTok videos as real-world data showed that LRCN had a higher confidence level in recognizing activities, with most predictions achieving confidence scores above 80%. This difference in performance indicates that while ConvLSTM excels in generalizing on training data, LRCN is more robust to real-world data variations.

References

Ahmad, I., Ullah, F., Khan, M. A., & Kim, D. (2021). Human action recognition in smart surveillance using hybrid deep learning model. Sensors, 21(18), 6104. https://doi.org/10.3390/s21186104

Almars, A. M. (2021). Deepfakes detection techniques using deep learning: A survey. Journal of Computer and Communications, 9(5), 20–35. https://doi.org/10.4236/jcc.2021.95003

Arshad, M. H., Bilal, M., & Gani, A. (2022). Human activity recognition: Review, taxonomy and open challenges. Sensors, 22(17), 1–33. https://doi.org/10.3390/s22176463

Elmetwally, A., Eldeeb, R., & Elmougy, S. (2025). Deep learning based anomaly detection in real-time video. Multimedia Tools and Applications, 84(11), 9555–9571. https://doi.org/10.1007/s11042-024-19116-9

Gupta, S. (2021). Deep learning based human activity recognition (HAR) using wearable sensor data. International Journal of Information Management Data Insights, 1(2), 100046. https://doi.org/10.1016/j.jjimei.2021.100046

Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A. K., & Davis, L. S. (2016). Learning temporal regularity in video sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 733–742). https://doi.org/10.1109/CVPR.2016.85

Ismail, A. P., Aziz, F. A. A., Kasim, N. M., & Daud, K. (2021). Hand gesture recognition on Python and OpenCV. IOP Conference Series: Materials Science and Engineering, 1045(1), 012043. https://doi.org/10.1088/1757-899x/1045/1/012043

Komatsu, M., Sakai, A., Komatsu, R., Matsuoka, R., Yasutomi, S., Shozu, K., … Hamamoto, R. (2021). Detection of cardiac structural abnormalities in fetal ultrasound videos using deep learning. Applied Sciences (Switzerland), 11(1), 1–12. https://doi.org/10.3390/app11010371

Lina, Augustine, M., Stephen, R., & Salim, L. (2024). Pengenalan aktivitas manusia dalam ruangan dengan convolutional neural networks. Teknika, 13(1), 58–64. https://doi.org/10.34148/teknika.v13i1.707

Mansour, R. F., Escorcia-Gutierrez, J., Gamarra, M., Villanueva, J. A., & Leal, N. (2021). Intelligent video anomaly detection and classification using Faster R-CNN with deep reinforcement learning model. Image and Vision Computing, 112, 104229. https://doi.org/10.1016/j.imavis.2021.104229

Rahman, A., Islam, M., Moon, M. J., Tasnim, T., & Siddique, N. (2022). A qualitative survey on deep learning-based deepfake video creation and detection method. Australian Journal of Engineering and Innovative Technology, 13–26. https://doi.org/10.34104/ajeit.022.013026

Ravanbakhsh, M., Nabi, M., Mousavi, H. S., & Sebe, N. (2017). Abnormal event detection in videos using generative adversarial nets. In 2017 IEEE International Conference on Image Processing (ICIP) (pp. 1577–1581). https://doi.org/10.1109/ICIP.2017.8296547

Sultani, W., Chen, C., & Shah, M. (2018). Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6479–6488). https://doi.org/10.1109/CVPR.2018.00678

Uddin, M. A., Talukder, M. A., Uzzaman, M. S., Debnath, C., Chanda, M., Paul, S., … Aryal, S. (2024). Deep learning-based human activity recognition using CNN, ConvLSTM, and LRCN. International Journal of Cognitive Computing in Engineering, 5, 259–268. https://doi.org/10.1016/j.ijcce.2024.06.004

Vrskova, R., Hudec, R., Kamencay, P., & Sykora, P. (2022). A new approach for abnormal human activities recognition based on ConvLSTM architecture. Sensors, 22(8), 1–20. https://doi.org/10.3390/s22082946

Wu, P., Pan, C., Yan, Y., Pang, G., Wang, P., & Zhang, Y. (2024). Deep learning for video anomaly detection: A review. arXiv. http://arxiv.org/abs/2409.05383

Zhou, T., Porikli, F., Crandall, D. J., Van Gool, L., & Wang, W. (2023). A survey on deep learning technique for video segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 7099–7122. https://doi.org/10.1109/TPAMI.2022.3225573