Shard

Pipeline-parallel LLM inference across GPUs on separate machines.

github.com

Built with

GLM-5.2NEW

Strong

The creator provides explicit performance benchmarks and verifiable receipts for GLM-5.2 runs.

The X post and repository README explicitly mention using GLM-5.2 for 744B parameter inference.

Build evidence

Strong

This is a functional code repository with detailed architecture documentation, research logs, and verifiable performance receipts.

Creator

leyten @leyten

Shipped

2h ago · model from Jun 16, 2026

Shard is an inference engine that enables distributed LLM serving by splitting model layers across heterogeneous GPUs on different networks. It overcomes WAN latency bottlenecks through techniques like pipelined speculative decoding and CUDA-graphed draft models, allowing large models like GLM-5.2 to run effectively over the open internet.

#inference #distributed #networking #llm

Source post