Evidence for every claim.

ARK's benchmark suites, architecture papers, and reproducibility notes. Every number on this site is independently verifiable — on our hardware and on yours, during a POC.

1 suite published 1 suite in progress Updated per release
Point-in-time runs.
Each suite pins a specific question against a specific hardware and model at a specific date. Click through for methodology, charts, and raw data.
Nov 2025 · Live

The stateful advantage, measured.

A 12-turn multi-topic conversation run with and without ARK's stateful mode. Same model, same hardware — only the KV-cache toggle changes. Result: flat latency, collapsed token volume, orders-of-magnitude less GPU prefill work.

18×
faster TTFT at turn 12
275×
fewer tokens per turn
131×
less cumulative GPU work
Read full report
Apr 2026 · In progress

Frontier-scale, better per watt.

Extending the study to 100B-class models and above. Broader hardware matrix, with watts-per-generated-token measured as a first-class metric alongside throughput — expect stronger TPS and better energy efficiency on the same ARK runtime.

Coming soon
The architecture, written down.
Technical notes and reference material that sit alongside the benchmark data — the how and the why behind the numbers.
ARCHITECTURE PAPER

The ARK runtime: stateful, isolated, fault-tolerant.

Supervisor, Compute Nodes, Model Storage, API Gateway — how each layer works and why the architecture is a structural advantage, not a feature.

Coming soon
TECHNICAL NOTE

Fault tolerance: keeping 99% of sessions alive.

What happens when a node drops mid-token, how the re-sharding protocol recovers, and why “the whole group crashes” isn't an acceptable failure mode.

Coming soon
REPRODUCIBILITY

Run ARK benchmarks on your hardware.

The prompt sets, the scripts, and the hardware requirements — so you can re-run every number on this site independently during a POC.

Coming soon
Not incremental. A structural advantage.
Competitors would need a ground-up rewrite to match these capabilities. This is an architecture gap, not a feature gap.
CapabilityARKvLLMTensorRT-LLMOllama
Heterogeneous GPU Support Any mix Homogeneous Homogeneous Single GPU
Elastic Hot-Scaling Runtime Restart Restart Not supported
Fault Tolerance 99% survival Group crash Group crash No HA
Multi-Model Tenancy Shard-level Per-model Static engines One per GPU
Network Requirement ~5 Mbit/s NVLink/IB NVLink/IBN/A
Session Isolation KV + attention Shared batching Shared batchingPer-process
Run this on your own infrastructure.
Three ways to deploy the same ARK runtime you just saw measured. Pick the control model that matches your operating posture.