v2.0 · Multi-Architecture Ternary ML Framework

The $0.50 neural
network.

TernML is the first multi-architecture framework for ternary neural networks with {-1,0,+1} weights — 5 architectures (Graph, CNN, Transformer, RNN, ViT), 84% CIFAR-10, C codegen for Cortex-M0+. No DSP. No FPU. Just pure add-and-shift inference on the cheapest microcontrollers.

5
Architectures
84%
CIFAR-10 (CNN)
1.58
Bits / Parameter
0
DSP / FPU / GPU

5 architectures.
One framework.

TernML unifies Graph, CNN, Transformer, RNN, and ViT under a single ternary QAT pipeline — all with float-beating accuracy and Cortex-M0+ codegen.

5
Supported Architectures
Graph, CNN, Transformer, RNN/LSTM, ViT — unified TernMLayer
84.03%
CIFAR-10 (Deep CNN)
STE-based QAT — exceeds float baseline
1.58
Bits / Parameter
8× compression vs 8-bit quantized
C Codegen
Cortex-M0+ output, p/m bit-sliced format
0
DSP / FPU / GPU
Pure add-shift — runs on any MCU

5 architectures.
One pipeline.

Unified QAT pipeline that works across GraphKAN, CNN, Transformer, RNN, and ViT — producing ultra-efficient ternary networks with C codegen for any MCU.

01

Choose Your Architecture

GraphKAN, CNN, Transformer, RNN/LSTM, or Vision Transformer — all use the same TernMLayer with STE-based QAT. Pick any architecture, get ternary weights.

5 architectures • Unified API
02

Train with Regularization

4-phase QAT pipeline: float clamp → STE ternarization → hard clamp → finetune. Ternary quantization acts as a regularizer, improving accuracy over the float baseline.

Regularization-by-quantization
03

Generate C Code

Export ternary weights in p/m bit-sliced format (32 trits/8 bytes). Generate pure C for Cortex-M0+ with no DSP, no FPU, no OS — bare-metal inference in milliseconds.

From $0.50 MCU • Codegen

5 architectures.
One ternary framework.

GraphKAN, CNN, Transformer, RNN/LSTM, Vision Transformer — all with the same 4-phase QAT pipeline and regularization-by-quantization effect.

Architecture Dataset Float Ternary Size
GraphKAN 256→100→10 MNIST 94.77% 96.15% 15.4 KB
GraphKAN 256→100→10 Fashion-MNIST 83.03% 84.67% 15.4 KB
CNN conv32→64→128→10 CIFAR-10 82.50% 84.03% ~100 KB
CNN conv32→64→FC128→10 Fashion-MNIST 91.57% 92.02% 102.8 KB
Transformer 2L/4H/128d CIFAR-10 TBD TBD TBD
LSTM 128d Sequence TBD TBD TBD
ViT patch8/128d CIFAR-10 TBD TBD TBD

* All architectures use the same 4-phase QAT pipeline with STE. C codegen available for Cortex-M0+ target. ELM mode: frozen random ternary hidden layer + closed-form least squares.

Comparable accuracy.
10× smaller.

Head-to-head against industry-standard TinyML frameworks. TernML delivers competitive accuracy at a fraction of the memory footprint.

Metric TernML TFLite Micro (8-bit) Edge Impulse
MNIST 96.15% 96.80% 96.20%
CIFAR-10 (deep CNN) 84.03% 47.00% 45.50%
Fashion-MNIST (CNN) 92.02% 91.50% 91.00%
Model Size (GraphKAN) 15 KB 128 KB 256+ KB
Peak RAM 4 KB 32 KB 64 KB
Bits / Parameter 1.58 8.00 8.00
Architectures 5 CNN only CNN only
C Codegen Yes No No
Natural Sparsity 49% 0% 0%
DSP Required No Yes Yes
FPU Required No Optional Yes
Min. MCU Cost $0.50 $1.50+ $3.00+

* TFLite and Edge Impulse benchmarks based on default 8-bit quantized models. TernML supports 5 architectures (Graph, CNN, Transformer, RNN, ViT) with C codegen for Cortex-M0+. Accuracy exceeds float baseline due to regularization-by-quantization effect.

Runs on anything with a C compiler.

No hardware accelerator, no DSP, no FPU. TernML runs on the cheapest MCUs on the market.

Cortex-M0+
ARM Cortex-M0+
  • 16 KB SRAM — fits entirely
  • $0.50 MCU cost
  • <100 ms inference
ESP32-S3
Xtensa LX7
  • 512 KB SRAM
  • WiFi + BLE on-chip
  • Headroom for sensors
GD32V
RISC-V RV32IMAC
  • 32 KB SRAM
  • Full model fits
  • Open ISA
MIK32 Amur
RISC-V RV32IMC
  • 8 KB SRAM
  • Flash-optimized
  • Ultra-low power

Built for the edge.‌

TernML is an independent research project focused on bringing neural network inference to the cheapest microcontrollers.

TernML is a multi-architecture framework for ternary neural networks — supporting GraphKAN, CNN, Transformer, RNN/LSTM, and Vision Transformer under one unified QAT pipeline with C codegen for Cortex-M0+. The core discovery: ternary quantization acts as a regularizer during training, improving generalization over the float baseline across all architectures.

“Discrete ternary weights act as a regularizer during training, naturally pruning noise while preserving signal.” — Regularization-by-Quantization Effect

The result: models that don't just compress well — they generalize better. 5 architectures, 95 passing tests, Cortex-M0+ codegen. This is TernML’s core insight.

5 architectures
Regularization-by-quantization
C codegen / Cortex-M0+
95 passing tests

Get in touch.

Interested in licensing, collaboration, or early access? Reach out directly.