Friday, May 13, 2022

tinyML Summit 2022: ‘Tiny models with big appetites: cultivating the perfect data diet'

Although lots of research effort goes into developing small model architectures for computer vision, real gains cannot be made without focusing on the data pipeline. We already mentioned the importance of quality data and some pitfalls of public datasets in an earlier blog post, and have been further improving our data tooling a lot since then to make our data processes even more powerful. We have incorporated several machine learning techniques that enable us to curate our datasets at deep learning scale, for example by identifying images that contributed strongly to a specific false prediction during training.

In this recording of his talk at tinyML Summit 2022, Jelmer demonstrates how our fully in-house data tooling leverages these techniques to curate millions of images, and describes several other essential parts of our model and data pipelines. Together, these allow us to build accurate person detection models that run on the tiniest edge devices.