KRAQX · Inference Silicon & OS

K-CORE

265 tokens/second at 30 watts.

128 Tensor-Linear Engines on TSMC A16 with SoIC-X 3D bonding to 512GB HBM4. No training paths, no graphics legacy. Just the inference dataflow.

In our measured open-stack benchmark configuration: 5.25× to 5.62× the throughput of H100 and 3.71× to 3.98× the throughput of B200, at less than 1/20 the power. Low enough to deploy anywhere. No liquid cooling, no special power circuits, no facility upgrades.

K-OS

5,300 lines. Zero unnecessary code.

Three execution contexts. Fewer than 8 interrupts/sec under load. The kernel paths Linux disables for latency? We never wrote them. The schedulers it papers over? Not present.

0.5s cold boot to first token. Deterministic memory layout. The stack you'd build if you started from the workload instead of POSIX.

VALIDATIONMay 2026

Validated on AWS FPGA hardware.

First Transformer Layer Engine stage synthesized and validated on AWS f2.6xlarge (AMD VU47P FPGA). End-to-end Verilator-to-hardware match, against the same reference vectors the simulator validated.

3 / 3 strict tests PASS · Cycle-exact at the count claimed · Bit-correct within design tolerance

Built
For One
Thing.

Platform

265 tokens/second at 30 watts.

5,300 lines. Zero unnecessary code.

Validated on AWS FPGA hardware.