Browse
Shard

Shard

Pipeline-parallel LLM inference across GPUs on separate machines.

github.com
Built with
GLM-5.2NEW
Strong

The creator provides explicit performance benchmarks and verifiable receipts for GLM-5.2 runs.

The X post and repository README explicitly mention using GLM-5.2 for 744B parameter inference.
Build evidence
Strong

This is a functional code repository with detailed architecture documentation, research logs, and verifiable performance receipts.

Creator
leyten @leyten
Shipped
2h ago · model from Jun 16, 2026

Shard is an inference engine that enables distributed LLM serving by splitting model layers across heterogeneous GPUs on different networks. It overcomes WAN latency bottlenecks through techniques like pipelined speculative decoding and CUDA-graphed draft models, allowing large models like GLM-5.2 to run effectively over the open internet.

Source post
Timeline
Teaser
Video
Playable
Product

Loading…

Media & coverage
sourced from 1 post