TinyML for IoT: Running Machine Learning on Microcontrollers Without the Cloud

TinyML is machine learning inference running directly on microcontrollers — no cloud, no local server, no network connection required. A model trained in the cloud is quantised, compiled, and flashed into kilobytes of flash memory to run within milliwatts of power. For IoT devices that need to run for years on a coin cell, it’s often the only viable architecture.

Target Hardware

Arduino Nano 33 BLE Sense

Built around the Nordic nRF52840 (64 MHz Cortex-M4F, 256 KB RAM, 1 MB flash), the Nano 33 BLE Sense packs an accelerometer, gyroscope, microphone, temperature, humidity, pressure, and colour sensors onto a thumb-sized board. It’s the most widely used hardware for TinyML prototyping and education. Inference on a simple keyword spotting model runs in under 5ms at less than 10mW average power.

STM32 Series (STM32H7, STM32L4)

STMicroelectronics’ STM32 family spans from ultra-low-power (STM32L4 at sub-100 µA in low-power mode) to high-performance (STM32H7 at 480 MHz with a dual-core Cortex-M7/M4). The STM32H7 is popular in industrial TinyML because it can run moderately complex CNNs for vibration signature classification. ST’s X-CUBE-AI tool converts Keras and ONNX models directly to optimised C code targeting the STM32 runtime.

ESP32-S3

Espressif’s ESP32-S3 adds vector instructions for neural network operations to the base ESP32 architecture. At under £2.50 in volume, it’s the cost leader for TinyML in consumer IoT, smart home devices, and hobbyist projects. The built-in Wi-Fi and Bluetooth make OTA model updates straightforward. The PSRAM option (8MB external) allows larger models that wouldn’t fit in internal SRAM alone.

Frameworks

TensorFlow Lite Micro (TFLM)

Google’s TensorFlow Lite Micro is the reference framework for microcontroller inference. Models are trained in TensorFlow or Keras, converted to TFLite flatbuffer format, then compiled using the TFLM interpreter — a header-only C++ library with no dynamic memory allocation and no OS dependency. Post-training quantisation (INT8) typically reduces model size by 4× with less than 2% accuracy loss on most classification tasks.

TFLM supports a growing set of op kernels optimised with CMSIS-NN for Arm Cortex-M cores, yielding 5–10× speedups over generic C implementations.

Edge Impulse

Edge Impulse is the dominant end-to-end TinyML platform. It wraps data collection, signal processing, model training, and deployment into a web-based workflow that produces optimised C++ libraries. Backends include TFLM and EON (its own compiler), with detailed flash/RAM/latency estimates before you commit to a target. Models export directly as Arduino libraries, Zephyr modules, or bare-metal C++ headers — several UK IoT startups have shipped production products using this pipeline.

Other Notable Frameworks

ONNX Runtime for Microcontrollers — Microsoft’s ORT Micro targets Cortex-M with a subset of ONNX operators. Useful when your model originates in PyTorch.

NNoM — A lightweight framework written in pure C, popular for Arm Cortex-M0/M0+ targets where CMSIS-NN is unavailable.

Arm Ethos-U microNPUs — Arm’s dedicated neural processing units, integrated in chips like the Alif Ensemble series. They run TFLM models with orders-of-magnitude better energy efficiency than CPU inference. Arm is based in Cambridge, and the Ethos-U is seeing increasing adoption in UK-designed IoT products.

Use Cases

Wake Word Detection

Keyword spotting — the “Hey” equivalent for embedded devices — is the canonical TinyML use case. A model listening continuously for a wake word consumes 1–3mW on a Cortex-M4. Edge Impulse’s MFCC + 1D-CNN pipeline achieves over 95% accuracy on a 4-class wake word task in under 50KB of flash.

Anomaly Detection on Vibration Data

Motors and pumps generate characteristic vibration signatures. A small autoencoder trained on normal operating data can detect anomalies in real time on an STM32 without sending any raw sensor data off-device. This is the foundation of self-contained predictive maintenance sensors that run for years on a lithium cell — a genuinely compelling use case for UK manufacturers looking to monitor legacy equipment.

Gesture Recognition

IMU-based gesture recognition on the Arduino Nano 33 or ESP32-S3 can classify dozens of distinct gestures with high accuracy using a compact CNN or LSTM. Applications include industrial worker safety monitoring, smart tools, and accessibility devices.

Power Budgets

A typical always-on TinyML node breaks down as follows:

Component	Current draw
MCU active (inference)	8–15 mA
MCU deep sleep (between inferences)	5–20 µA
MEMS microphone (active)	0.6 mA
Accelerometer (ODR 100 Hz)	0.15 mA
BLE advertising	0.5 mA average

A CR2032 coin cell (225 mAh) with an inference duty cycle of 1% (one 50ms inference per 5 seconds) supports roughly 18 months of continuous operation — no wired power, no battery swap for the life of the product.

The Training Pipeline

Data collection comes first — capture labelled sensor data via Edge Impulse’s data forwarder, a USB serial logger, or cloud sync from deployed devices. Then feature engineering: extract MFCC spectrograms (audio), FFT features (vibration), or raw windows (IMU) depending on modality.

Model design and training follows — start with Edge Impulse’s AutoML, refine with custom Keras layers if needed. Target INT8 quantisation from the start. Benchmark on target using Edge Impulse’s EON Tuner or ST’s X-CUBE-AI to get accurate flash, RAM, and latency numbers before flashing.

Deploy and iterate. Push model updates OTA (ESP32 HTTPS OTA or Mender for more complex fleets). Log confidence scores from production devices to detect distribution shift.

The Bottom Line

TinyML has crossed the threshold from research curiosity to production-ready technology. The combination of capable, affordable hardware (Arduino Nano 33, STM32H7, ESP32-S3), mature frameworks (TFLM, Edge Impulse), and a growing library of reference models means a competent embedded engineer can go from idea to a deployed, battery-powered ML node in days rather than months. For IoT applications where latency, privacy, connectivity, and cost all push against cloud inference, TinyML is no longer an exotic option — it’s the right architecture.