Tuesday, January 4, 2022

tinyML Talk: Demoing the world’s fastest inference engine for Arm Cortex-M

We recently announced Plumerai’s inference engine for 8-bit deep learning models on Arm Cortex-M microcontrollers. We showed that it is the world’s most efficient on MobileNetV2, beating TensorFlow Lite for Microcontrollers with CMSIS-NN kernels by 40% in terms of latency and 49% in terms of RAM usage with no loss in accuracy. However, that was just on a single network and it might have been cherry-picked. This presentation shows a live demonstration of our new service that you can use to test your own 8-bit deep learning models with Plumerai’s inference engine. In this talk Cedric explains what we did to get these speedups and memory improvements and shows benchmarks for the most important publicly available neural network models.