Abstract:
As a part of the video classification task, action recognition is also known as a task with heavy
computational load, with models mostly trained on devices with multiple GPUs. The recent
development of neural networks introduces the Binarized Neural Network (BNN), which offers a
solution to these problems. BNNs are trained with binary activations and weights, which reduces
the computation from 32-bits to 1-bit. Theoretically, this feature can perform using 32x less
memory and hardware resource compared to the conventional, full-precision neural networks.
Theoretically, the conversion from full-precision CNN to BNN should result in a smaller model
size and faster inference time. However, training time of BNN model is proven to be longer than
its full-precision counterpart. Distributed programming platform such as Apache Spark has been
proven to shorten the training time, which in theory could improve the training process of BNN
models. In this research, a novel binarized 3D CNN model is built using the principles of BNN
and tested against the full-precision CNN to determine if BNN is suitable for performing action
recognition on lower-powered devices. This research is one of the first research to involve
binarized 3D BNN in video classification, and resulted in smaller accuracy difference against the
full-precision model compared to previous research. The distributed training used in this research
also shortens the training time of the model.