v2.0 · Multi-Architecture Ternary ML Framework

The $0.50 neural
network.

TernML is the first multi-architecture framework for ternary neural networks with {-1,0,+1} weights — 5 architectures (Graph, CNN, Transformer, RNN, ViT), 84% CIFAR-10, C codegen for Cortex-M0+. No DSP. No FPU. Just pure add-and-shift inference on the cheapest microcontrollers.

Get Early Access Read the Research

Architectures

84%

CIFAR-10 (CNN)

1.58

Bits / Parameter

DSP / FPU / GPU

Key Metrics

5 architectures.
One framework.

TernML unifies Graph, CNN, Transformer, RNN, and ViT under a single ternary QAT pipeline — all with float-beating accuracy and Cortex-M0+ codegen.

Supported Architectures

Graph, CNN, Transformer, RNN/LSTM, ViT — unified TernMLayer

84.03%

CIFAR-10 (Deep CNN)

STE-based QAT — exceeds float baseline

1.58

Bits / Parameter

8× compression vs 8-bit quantized

✓

C Codegen

Cortex-M0+ output, p/m bit-sliced format

DSP / FPU / GPU

Pure add-shift — runs on any MCU

How It Works

5 architectures.
One pipeline.

Unified QAT pipeline that works across GraphKAN, CNN, Transformer, RNN, and ViT — producing ultra-efficient ternary networks with C codegen for any MCU.

Choose Your Architecture

GraphKAN, CNN, Transformer, RNN/LSTM, or Vision Transformer — all use the same TernMLayer with STE-based QAT. Pick any architecture, get ternary weights.

5 architectures • Unified API

Train with Regularization

4-phase QAT pipeline: float clamp → STE ternarization → hard clamp → finetune. Ternary quantization acts as a regularizer, improving accuracy over the float baseline.

Regularization-by-quantization

Generate C Code

Export ternary weights in p/m bit-sliced format (32 trits/8 bytes). Generate pure C for Cortex-M0+ with no DSP, no FPU, no OS — bare-metal inference in milliseconds.

From $0.50 MCU • Codegen

Architectures

5 architectures.
One ternary framework.

GraphKAN, CNN, Transformer, RNN/LSTM, Vision Transformer — all with the same 4-phase QAT pipeline and regularization-by-quantization effect.

Architecture	Dataset	Float	Ternary	Size
GraphKAN 256→100→10	MNIST	94.77%	96.15%	15.4 KB
GraphKAN 256→100→10	Fashion-MNIST	83.03%	84.67%	15.4 KB
CNN conv32→64→128→10	CIFAR-10	82.50%	84.03%	~100 KB
CNN conv32→64→FC128→10	Fashion-MNIST	91.57%	92.02%	102.8 KB
Transformer 2L/4H/128d	CIFAR-10	TBD	TBD	TBD
LSTM 128d	Sequence	TBD	TBD	TBD
ViT patch8/128d	CIFAR-10	TBD	TBD	TBD

* All architectures use the same 4-phase QAT pipeline with STE. C codegen available for Cortex-M0+ target. ELM mode: frozen random ternary hidden layer + closed-form least squares.

Benchmarks

Comparable accuracy.
10× smaller.

Head-to-head against industry-standard TinyML frameworks. TernML delivers competitive accuracy at a fraction of the memory footprint.

Metric	TernML	TFLite Micro (8-bit)	Edge Impulse
MNIST	96.15%	96.80%	96.20%
CIFAR-10 (deep CNN)	84.03%	47.00%	45.50%
Fashion-MNIST (CNN)	92.02%	91.50%	91.00%
Model Size (GraphKAN)	15 KB	128 KB	256+ KB
Peak RAM	4 KB	32 KB	64 KB
Bits / Parameter	1.58	8.00	8.00
Architectures	5	CNN only	CNN only
C Codegen	Yes	No	No
Natural Sparsity	49%	0%	0%
DSP Required	No	Yes	Yes
FPU Required	No	Optional	Yes
Min. MCU Cost	$0.50	$1.50+	$3.00+

* TFLite and Edge Impulse benchmarks based on default 8-bit quantized models. TernML supports 5 architectures (Graph, CNN, Transformer, RNN, ViT) with C codegen for Cortex-M0+. Accuracy exceeds float baseline due to regularization-by-quantization effect.

Hardware Support

Runs on anything with a C compiler.

No hardware accelerator, no DSP, no FPU. TernML runs on the cheapest MCUs on the market.

Cortex-M0+

ARM Cortex-M0+

16 KB SRAM — fits entirely
$0.50 MCU cost
<100 ms inference

ESP32-S3

Xtensa LX7

512 KB SRAM
WiFi + BLE on-chip
Headroom for sensors

GD32V

RISC-V RV32IMAC

32 KB SRAM
Full model fits
Open ISA

MIK32 Amur

RISC-V RV32IMC

8 KB SRAM
Flash-optimized
Ultra-low power

About

Built for the edge.‌

TernML is an independent research project focused on bringing neural network inference to the cheapest microcontrollers.

TernML is a multi-architecture framework for ternary neural networks — supporting GraphKAN, CNN, Transformer, RNN/LSTM, and Vision Transformer under one unified QAT pipeline with C codegen for Cortex-M0+. The core discovery: ternary quantization acts as a regularizer during training, improving generalization over the float baseline across all architectures.

          “Discrete ternary weights act as a regularizer during training, naturally pruning noise while preserving signal.”
          — Regularization-by-Quantization Effect
        

The result: models that don't just compress well — they generalize better. 5 architectures, 95 passing tests, Cortex-M0+ codegen. This is TernML’s core insight.

GitHub Habr #1 Habr #2 Zenodo DOI

5 architectures

Regularization-by-quantization

C codegen / Cortex-M0+

95 passing tests

The $0.50 neuralnetwork.

5 architectures.One framework.

5 architectures.One pipeline.