First Edition · 2026 · Production v1.0
The Principal Performance Architect's Guide

AI/ML Infrastructure
from Silicon to Scale

GPU Physics → Transformer Kernels → Distributed Training → Production Operations

A practical, layer-by-layer guide for engineers moving from systems performance into AI/ML infrastructure architecture — from HBM bandwidth and roofline analysis to LLM serving, distributed training, observability, and principal-level interviews.

18
Chapters
6
Parts
40
Diagrams
580+
Pages Target
Start Reading View Diagrams
Scroll
Venkat Vinjam
AI/ML Performance Architect · AMD · Ex-Intel
Focus Silicon · Kernels · Runtime · Clusters · TCO
Audience Senior / Staff / Principal Engineers
Guide
How to Use This Book
Choose the path that matches your goal
🧱
Silicon-Up Reader
Start with Ch01–Ch05 to build first-principles performance intuition from hardware to TCO.
Inference Fast-Track
Read Ch01, Ch06, Ch08, Ch09, Ch11, Ch13, and Ch17 for LLM serving readiness.
🌐
Training Specialist
Prioritize Ch01, Ch04, Ch10, Ch12, Ch14, Ch15, and Ch17 for distributed training systems.
🎯
Interview Prep
Use Ch01, Ch02, Ch06, Ch10, Ch11, Ch14, and Ch18 for principal-level system design.
989.4T
H100 Dense BF16
3.35
H100 HBM TB/s
295
Dense Ridge F/B
16B
Bytes/param AdamW
TP≤8
NVLink Domain Rule
82%
KV Cache Alert
6N
Training FLOPs/Token
00–18
Chapters
HTML where available · PDF always available
Fig
Architecture Diagrams
Open in browser · print to PDF if needed
App
Appendices
Reference tables, tools, and glossary