Wednesday, November 9, 2022
MLPerf Tiny 1.0 confirms: Plumerai’s inference engine is again the world’s fastest
Earlier this year in April we presented our MLPerf Tiny 0.7 benchmark scores, showing that our inference engine runs your AI models faster than any other tool. Today, MLPerf released the Tiny 1.0 scores and Plumerai has done it again: we still have the world’s fastest inference engine for Arm Cortex-M architectures. Faster inferencing means you can run larger and more accurate AI models, go into sleep mode earlier to save power, and run them on smaller and lower cost MCUs. Our inference engine executes the AI model as-is and does no additional quantization, no binarization, no pruning, and no model compression. There is no accuracy loss. It simply runs faster and in a smaller memory footprint.
Here’s the overview of the MLPerf Tiny results for an Arm Cortex-M4 MCU:
Vendor | Visual Wake Words | Image Classification | Keyword Spotting | Anomaly Detection |
---|---|---|---|---|
Plumerai | 208.6 ms | 173.2 ms | 71.7 ms | 5.6 ms |
STMicroelectronics | 230.5 ms | 226.9 ms | 75.1 ms | 7.6 ms |
301.2 ms | 389.5 ms | 99.8 ms | 8.6 ms | |
336.5 ms | 389.2 ms | 144.0 ms | 11.7 ms |
Compared to TensorFlow Lite for Microcontrollers with Arm’s CMSIS-NN optimized kernels, we run 1.74x faster:
Speed | Visual Wake Words | Image Classification | Keyword Spotting | Anomaly Detection | Speedup |
---|---|---|---|---|---|
335.97 ms | 376.08 ms | 100.72 ms | 8.45 ms | ||
194.36 ms | 170.42 ms | 66.32 ms | 5.59 ms | ||
1.73x | 2.21x | 1.52x | 1.51x | 1.74x |
But not only latency is important. Since MCUs often have very limited memory on board it’s important that the inference engine uses minimal memory while executing the neural network. MLPerf Tiny does not report numbers for memory usage, but here are the memory savings we can achieve on the benchmarks compared to TensorFlow Lite for Microcontrollers. On average we use less than half the memory:
Memory | Visual Wake Words | Image Classification | Keyword Spotting | Anomaly Detection | Reduction |
---|---|---|---|---|---|
98.80 | 54.10 | 23.90 | 2.60 | ||
36.50 | 37.80 | 17.00 | 1.00 | ||
2.71x | 1.43x | 1.41x | 2.60x | 2.04x |
MLPerf doesn’t report code size, so again we compare against TensorFlow Lite for Microcontrollers. The table below shows that we reduce code size on average by a factor of 2.18x. Using our inference engine means you can use MCUs that have significantly smaller flash size.
Code size | Visual Wake Words | Image Classification | Keyword Spotting | Anomaly Detection | Reduction |
---|---|---|---|---|---|
209.60 | 116.40 | 126.20 | 67.20 | ||
89.20 | 48.30 | 46.10 | 54.20 | ||
2.35x | 2.41x | 2.74x | 1.24x | 2.18x |
Want to see how fast your models can run? You can submit them for free on our Plumerai Benchmark service. We email you the results in minutes.
Are you deploying AI on microcontrollers? Let’s talk.