DEVELOPERS · RESEARCH

Evidence for every claim.

ARK's benchmark suites, architecture papers, and reproducibility notes. Every number on this site is independently verifiable — on our hardware and on yours, during a POC.

1 suite published 1 suite in progress Updated per release

BENCHMARK SUITES

Point-in-time runs.

Each suite pins a specific question against a specific hardware and model at a specific date. Click through for methodology, charts, and raw data.

Nov 2025 · Live

The stateful advantage, measured.

A 12-turn multi-topic conversation run with and without ARK's stateful mode. Same model, same hardware — only the KV-cache toggle changes. Result: flat latency, collapsed token volume, orders-of-magnitude less GPU prefill work.

18×

faster TTFT at turn 12

275×

fewer tokens per turn

131×

less cumulative GPU work

Read full report →

Apr 2026 · In progress

Frontier-scale, better per watt.

Extending the study to 100B-class models and above. Broader hardware matrix, with watts-per-generated-token measured as a first-class metric alongside throughput — expect stronger TPS and better energy efficiency on the same ARK runtime.

Coming soon

PAPERS & NOTES

The architecture, written down.

Technical notes and reference material that sit alongside the benchmark data — the how and the why behind the numbers.

ARCHITECTURE PAPER

The ARK runtime: stateful, isolated, fault-tolerant.

Supervisor, Compute Nodes, Model Storage, API Gateway — how each layer works and why the architecture is a structural advantage, not a feature.

Coming soon

TECHNICAL NOTE

Fault tolerance: keeping 99% of sessions alive.

What happens when a node drops mid-token, how the re-sharding protocol recovers, and why “the whole group crashes” isn't an acceptable failure mode.

Coming soon

REPRODUCIBILITY

Run ARK benchmarks on your hardware.

The prompt sets, the scripts, and the hardware requirements — so you can re-run every number on this site independently during a POC.

Coming soon

HOW WE COMPARE

Not incremental. A structural advantage.

Competitors would need a ground-up rewrite to match these capabilities. This is an architecture gap, not a feature gap.

Capability	ARK	vLLM	TensorRT-LLM	Ollama
Heterogeneous GPU Support	✓ Any mix	✗ Homogeneous	✗ Homogeneous	✗ Single GPU
Elastic Hot-Scaling	✓ Runtime	✗ Restart	✗ Restart	✗ Not supported
Fault Tolerance	✓ 99% survival	✗ Group crash	✗ Group crash	✗ No HA
Multi-Model Tenancy	✓ Shard-level	✗ Per-model	✗ Static engines	✗ One per GPU
Network Requirement	✓ ~5 Mbit/s	✗ NVLink/IB	✗ NVLink/IB	N/A
Session Isolation	✓ KV + attention	✗ Shared batching	✗ Shared batching	Per-process

Evidence for every claim.

The stateful advantage, measured.

Frontier-scale, better per watt.

The ARK runtime: stateful, isolated, fault-tolerant.

Fault tolerance: keeping 99% of sessions alive.

Run ARK benchmarks on your hardware.

The managed API.

Managed deployment.

Runtime license.