Wednesday, November 9, 2022

MLPerf Tiny 1.0 confirms: Plumerai’s inference engine is again the world’s fastest

Earlier this year in April we presented our MLPerf Tiny 0.7 benchmark scores, showing that our inference engine runs your AI models faster than any other tool. Today, MLPerf released the Tiny 1.0 scores and Plumerai has done it again: we still have the world’s fastest inference engine for Arm Cortex-M architectures. Faster inferencing means you can run larger and more accurate AI models, go into sleep mode earlier to save power, and run them on smaller and lower cost MCUs. Our inference engine executes the AI model as-is and does no additional quantization, no binarization, no pruning, and no model compression. There is no accuracy loss. It simply runs faster and in a smaller memory footprint.

Here’s the overview of the MLPerf Tiny results for an Arm Cortex-M4 MCU:

Vendor	Visual Wake Words	Image Classification	Keyword Spotting	Anomaly Detection
Plumerai	208.6 ms	173.2 ms	71.7 ms	5.6 ms
STMicroelectronics	230.5 ms	226.9 ms	75.1 ms	7.6 ms
OctoML (microTVM, CMSIS-NN)	301.2 ms	389.5 ms	99.8 ms	8.6 ms
OctoML (microTVM, native codegen)	336.5 ms	389.2 ms	144.0 ms	11.7 ms

Official MLPerf Tiny 1.0 inference results for an Arm Cortex-M4 at 120MHz (STM32L4R5ZIT6U).

Compared to TensorFlow Lite for Microcontrollers with Arm’s CMSIS-NN optimized kernels, we run 1.74x faster:

Speed	Visual Wake Words	Image Classification	Keyword Spotting	Anomaly Detection	Speedup
TFLM latency	335.97 ms	376.08 ms	100.72 ms	8.45 ms
Plumerai latency	194.36 ms	170.42 ms	66.32 ms	5.59 ms
Speedup	1.73x	2.21x	1.52x	1.51x	1.74x

MLPerf Tiny inference latency for the Arm Cortex-M4 at 120MHz inside the STM32L4R9 MCU.

But not only latency is important. Since MCUs often have very limited memory on board it’s important that the inference engine uses minimal memory while executing the neural network. MLPerf Tiny does not report numbers for memory usage, but here are the memory savings we can achieve on the benchmarks compared to TensorFlow Lite for Microcontrollers. On average we use less than half the memory:

Memory	Visual Wake Words	Image Classification	Keyword Spotting	Anomaly Detection	Reduction
TFLM memory (KiB)	98.80	54.10	23.90	2.60
Plumerai memory (KiB)	36.50	37.80	17.00	1.00
Reduction	2.71x	1.43x	1.41x	2.60x	2.04x

MLPerf Tiny memory usage is lower by a factor of 2.04x.

MLPerf doesn’t report code size, so again we compare against TensorFlow Lite for Microcontrollers. The table below shows that we reduce code size on average by a factor of 2.18x. Using our inference engine means you can use MCUs that have significantly smaller flash size.

Code size	Visual Wake Words	Image Classification	Keyword Spotting	Anomaly Detection	Reduction
TFLM code (KiB)	209.60	116.40	126.20	67.20
Plumerai code (KiB)	89.20	48.30	46.10	54.20
Reduction	2.35x	2.41x	2.74x	1.24x	2.18x

MLPerf Tiny code size is lower by a factor of 2.18x.

Want to see how fast your models can run? You can submit them for free on our Plumerai Benchmark service. We email you the results in minutes.

Are you deploying AI on microcontrollers? Let’s talk.