June 2026 · Proprietary Technology

GraphKAN

Kolmogorov-Arnold Network with Discrete Ternary Weights {−1, 0, +1}
First KAN quantized below 4-bit — 1.58 bits per parameter. Regularization by quantization. TinyML ready.
1.58 bits/param 95.35% MNIST 19.95 KB model 0 DSP inference 49% natural sparsity
95.35%
MNIST Accuracy
19.95 KB
Model Size
1.58
Bits / Parameter
49%
Natural Sparsity

Abstract

We present GraphKAN, the first Kolmogorov-Arnold Network with discrete ternary control points {−1, 0, +1}, achieving 1.58 bits per parameter. Our 4-phase quantization-aware training pipeline yields 95.35% on MNIST at just 19.95 KB — a regularization-by-quantization effect where ternary outperforms the float baseline.

Our method generalizes across five domains: MNIST (95.35%, 19.95 KB), Fashion-MNIST (85.04%, 12.77 KB), HAR (92.60%, 13.61 KB), FSDD audio (85.67%, 24.96 KB), and CIFAR-10 (47.83%, 38.78 KB). All models fit in microcontroller SRAM without requiring floating-point hardware.

GraphKAN Architecture

input input input 256 input hidden hidden hidden 100 hidden output output output 10 output N cycles Each edge: piecewise-linear 3 ternary CP {−1,0,+1} tanh activation, 1/fan_in scale Ternary weights −1, 0, +1 compact encoding
GraphKAN: 366 neurons, 26,600 edges, N synchronous update cycles

4-Phase QAT Pipeline

Phase 1 Float warm-up clamp to range Phase 2 Ternary QAT gradual ternarization Phase 3 Hard clamp strict {−1,0,+1} Phase 4 Finetune scale + bias only Accuracy progression 93.84% 94.74% 95.32% 95.35% +0.90% +0.58% +0.03%
Accuracy increases stepwise during quantization — a novel regularization effect

MNIST Results

Model Weights Bits/Param Size MNIST Sparsity Hardware MLP 256-100-10 float32 32 243 KB 96.5% 0% FPU+DSP KAN 3-10-10 float32 32 93.5 KB 94.77% 0% FPU+DSP QuantKAN int4 4 23.4 KB 94.2% 0% DSP BiKA (binary) binary 1 9.6 KB 92.7% 0% DSP GraphKAN (ours) ternary 1.58 19.95 KB 95.35% 49% 0 DSP 95.35% > 93.84% (float) quantization improves accuracy 1/16 the size of float KAN 5× smaller than MLP Additional results CIFAR-10: 47.83% (8×8 input) Fashion-MNIST (ELM): 99.3% of BP Inference: ~60k img/sec (STM32F4)
GraphKAN outperforms float KAN while using 1/6 the memory — zero DSP required

Regularization by Quantization

Key insight: Constraining weights to {−1, 0, +1} acts as a regularizer The model cannot memorize noise — forced to learn only robust features Ternary constraint eliminates 33%+ of parameter degrees of freedom Natural sparsity 49% of weights become 0 → built-in feature selection Accuracy improvement 93.84% (float) → 95.35% (ternary) +1.51 percentage points This mirrors the information bottleneck principle: limited capacity → generalizable representations
Ternary quantization improves accuracy — opposite of what conventional quantization theory predicts

Training & Scaling

Training requirements 15 epochs total Single consumer GPU Fast convergence Scaling properties ELM mode: O(n) hidden, no BP Arbitrary graph topology supported Depth scales N cycles = N layers MNIST 95.35%: 19.95 KB — 2 min on consumer GPU
Extremely lightweight training — accessible to anyone with a consumer GPU

Universal QAT Pipeline — Beyond KAN

Same 4-phase QAT pipeline works on any architecture GraphKAN (graph) Piecewise-linear edges 3 ternary control points Graph message passing ✓ 95.35% MNIST 19.95 KB CNN (conv) Conv2d + ReLU + MaxPool Ternary conv + fc layers Standard CNN topology ✓ 85.04% Fashion-MNIST 12.77 KB Float baseline Ternary QAT 16× smaller GraphKAN: 93.84% 95.35% CNN: 83.12% 85.04% Both architectures show regularization-by-quantization: ternary > float
The same QAT pipeline works on CNN — ternary quantization improves accuracy regardless of architecture

ELM Mode: Random Features + Least Squares

H=50 H=100 H=200 H=500 Full BP 74.5% 77.0% 77.7% 78.7% 79.2% 3.7 KB 6.8 KB 14 KB 31.6 KB float 94.1% 97.3% 98.1% 99.3% Fashion-MNIST: Random GraphKAN + LSQ vs Full BP
99.3% of backprop accuracy with zero hidden layer training — at 31.6 KB

Comparison with Prior Art

Method KAN? Ternary? Bits Year KAN 32 2024 QuantKAN 4 2025 KANtize 2–3* 2026 BiKA ~ 2026 BitNet 1.58 2024 GraphKAN (ours) 1.58 2026 * KANtize quantizes B-spline tables, not weights
First KAN with ternary weights — matching BitNet at 1.58 bits/param

Hardware Efficiency

Ternary multiply = multiplexer (0 DSP) +1 → pass −1 → negate 0 → zero Deployment targets Cortex-M0+ ($0.50) Cortex-M4 L1 cache ESP32-S3 RISC-V GD32V Arduino RP2040 Smartwatch DSP 19.95 KB — fits in L1 cache of any modern microcontroller Zero FPU. Zero DSP. Only add/shift operations.
Ternary weights eliminate DSP slices — deployable on $0.50 microcontrollers

Author

YV

Yuri Venediktov (Fakeonomics)

Independent researcher, 17 years old. Invented ternary KAN and VSA multi-hop reasoning.

github.com/Fakeonomics

June 2026. Proprietary technology — All Rights Reserved.