Accelerated Low Power AI System for Indian Sign Language Recognition
Abstract
Deep Convolutional Neural Network (CNN) based methods have become more powerful for wide variety of applications particularly in Natural Language Processing and Computer vision. Nevertheless, the CNN-based methods are computational expensive and more resource-hungry, and hence are becoming difficult to implement on battery operated devices like smart phones, AR/VR glasses, Autonomous Robots etc. Also with the increasing complexity of deep learning models like ResNet-50, there is a growing demand for efficient hardware accelerators to handle the computational workload. In this paper, we present the design and implementation of a neural network accelerator tailored for ResNet-50 on the ZCU102 platform using Field-Programmable Gate Arrays (FPGAs) which offers and customizable solution to address this challenge. We systematically investigate the design choices and optimization strategies for deploying custom built ResNet-50 network trained for Indian Sign language translation of 76 gestures enacted and build in our labs for Doctor patient interface on FPGA-based accelerators. In order to enhance operational speed, we have employed various techniques, including parallelism and pipelining, leveraging Depthwise Separable Convolution. Furthermore, we have implemented hierarchical memory allocation for different offsets using threads. Additionally, we have utilized weight and data quantization to optimize operational speed while minimizing resource consumption, thus achieving low power consumption while maintaining acceptable levels of inference accuracy. We, evaluated our accelerated FPGA model against CPU interms of various performance metrics viz: frames per second (fps), Memory allocations, LUTs, DSPs and Block RAMs used. Our findings underscore the superiority of FPGA-based accelerators, as evidenced by achieving a frame rate of 2.7fps on the Xilinx Ultra Scale ZCU102 platform with int8 quantization, compared to 0.8fps for Single precision. In contrast, the CPU achieved a frame rate of 0.6fps. Notably, we observed a minimal accuracy variation of only 1.37% with int8 quantization, while no accuracy variation was observed for Single precision. Our implementation utilized 16 convolution threads and 4 FC threads operating at 200 MHz for single precision, whereas for int8, we employed 25 convolution threads and 16 FC threads operating at 250 MHz.
References
[2] Zhang, Chen, et al. "Optimizing FPGA-based accelerator design for deep convolutional neural networks." Proceedings of the 2015 ACM/SI GDA international symposium on field-programmable gate arrays. 2015.
[3] Chen, Tianshi, et al. "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning." ACM SIGARCH Computer Architecture News 42.1 (2014): 269-284.
[4] Chen, Yunji, et al. "Dadiannao: A machine-learning supercomputer." 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 2014.
[5] Liu, Daofu, et al. "Pudiannao: A polyvalent machine learning accelerator." ACM SIGARCH Computer Architecture News 43.1 (2015): 369-381..
[6] Du, Zidong, et al. "ShiDianNao: Shifting vision processing closer to the sensor." Proceedings of the 42nd annual international symposium on computer architecture. 2015.
[7] Chakradhar, Srimat, et al. "A dynamically configurable coprocessor for convolutional neural networks." Proceedings of the 37th annual international symposium on Computer architecture. 2010.
[8] Farabet, Clément, et al. "Neuflow: A runtime reconfigurable dataflow processor for vision." CVPR 2011 workshops. IEEE, 2011.
[9] Farabet, Clément, et al. "Cnp: An fpga-based processor for convolutional networks." 2009 International Conference on Field Programmable Logic and Applications. IEEE, 2009.
[10] Jella Sandhya, KANCHARLA ANITHASHEELA. Spatiotemporal Modeling for Dynamic Gesture Recognition in Video Streams, 08 March 2024, PREPRINT (Version 1)available at Research Square [https://doi.org/10.21203/rs.3.rs-4019650/v1]
[11] Qiu, Jiantao, et al. "Going deeper with embedded FPGA platform for convolutional neural network." Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays. 2016
[12] Nurvitadhi, Eriko, et al. "Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC." 2016 International Conference on Field-Programmable Technology (FPT). IEEE, 2016.
[13] Dhilleswararao, Pudi, et al. "Efficient hardware architectures for accelerating deep neural networks: Survey." IEEE access 10 (2022): 131788-131828.
[14] Bai, Lin, Yiming Zhao, and Xinming Huang. "A CNN accelerator on FPGA using depthwise separable convolution." IEEE Transactions on Circuits and Systems II: Express Briefs 65.10 (2018): 1415-1419.
[15] Farabet, Clément, et al. "Large-scale FPGA-based convolutional networks." Scaling up machine learning: parallel and distributed approaches 13.3 (2011): 399-419.
[16] Sankaradas, Murugan, et al. "A massively parallel coprocessor for convolutional neural networks." 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors. IEEE, 2009.
[17] Larkin, Daniel, Andrew Kinane, and Noel O’Connor. "Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices." Neural Information Processing: 13th International Conference, ICONIP 2006, Hong Kong, China, October 3-6, 2006. Proceedings, Part III 13. Springer Berlin Heidelberg, 2006.
[18] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324
To ensure uniformity of treatment among all contributors, other forms may not be substituted for this form, nor may any wording of the form be changed. This form is intended for original material submitted to AJCT and must accompany any such material in order to be published by AJCT. Please read the form carefully.
The undersigned hereby assigns to the Asian Journal of Convergence in Technology Issues ("AJCT") all rights under copyright that may exist in and to the above Work, any revised or expanded derivative works submitted to AJCT by the undersigned based on the Work, and any associated written, audio and/or visual presentations or other enhancements accompanying the Work. The undersigned hereby warrants that the Work is original and that he/she is the author of the Work; to the extent the Work incorporates text passages, figures, data or other material from the works of others, the undersigned has obtained any necessary permission. See Retained Rights, below.
AUTHOR RESPONSIBILITIES
AJCT distributes its technical publications throughout the world and wants to ensure that the material submitted to its publications is properly available to the readership of those publications. Authors must ensure that The Work is their own and is original. It is the responsibility of the authors, not AJCT, to determine whether disclosure of their material requires the prior consent of other parties and, if so, to obtain it.
RETAINED RIGHTS/TERMS AND CONDITIONS
1. Authors/employers retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
2. Authors/employers may reproduce or authorize others to reproduce The Work and for the author's personal use or for company or organizational use, provided that the source and any AJCT copyright notice are indicated, the copies are not used in any way that implies AJCT endorsement of a product or service of any employer, and the copies themselves are not offered for sale.
3. Authors/employers may make limited distribution of all or portions of the Work prior to publication if they inform AJCT in advance of the nature and extent of such limited distribution.
4. For all uses not covered by items 2 and 3, authors/employers must request permission from AJCT.
5. Although authors are permitted to re-use all or portions of the Work in other works, this does not include granting third-party requests for reprinting, republishing, or other types of re-use.
INFORMATION FOR AUTHORS
AJCT Copyright Ownership
It is the formal policy of AJCT to own the copyrights to all copyrightable material in its technical publications and to the individual contributions contained therein, in order to protect the interests of AJCT, its authors and their employers, and, at the same time, to facilitate the appropriate re-use of this material by others.
Author/Employer Rights
If you are employed and prepared the Work on a subject within the scope of your employment, the copyright in the Work belongs to your employer as a work-for-hire. In that case, AJCT assumes that when you sign this Form, you are authorized to do so by your employer and that your employer has consented to the transfer of copyright, to the representation and warranty of publication rights, and to all other terms and conditions of this Form. If such authorization and consent has not been given to you, an authorized representative of your employer should sign this Form as the Author.
Reprint/Republication Policy
AJCT requires that the consent of the first-named author and employer be sought as a condition to granting reprint or republication rights to others or for permitting use of a Work for promotion or marketing purposes.
GENERAL TERMS
1. The undersigned represents that he/she has the power and authority to make and execute this assignment.
2. The undersigned agrees to indemnify and hold harmless AJCT from any damage or expense that may arise in the event of a breach of any of the warranties set forth above.
3. In the event the above work is accepted and published by AJCT and consequently withdrawn by the author(s), the foregoing copyright transfer shall become null and void and all materials embodying the Work submitted to AJCT will be destroyed.
4. For jointly authored Works, all joint authors should sign, or one of the authors should sign as authorized agent
for the others.
Licenced by :
Creative Commons Attribution 4.0 International License.
