NVIDIA Vera Rubin NVL72 Deep Dive: Inside the 3.6 EFLOPS AI Factory

The Vera Rubin NVL72 is the rack that succeeds the Blackwell GB300 NVL72 as NVIDIA’s flagship AI factory unit. In this deep dive we walk through the system as a hardware engineer would: GPU, CPU, memory hierarchy, fabric, networking, power, cooling, and the operational implications of each.

The Building Block: Vera Rubin Superchip

At the heart of the rack is the Vera Rubin superchip, one Vera CPU and two Rubin GPUs on a single board, connected by chip-to-chip NVLink. This is the same compositional pattern as Grace Hopper and Grace Blackwell, scaled to the new generation.

Each Rubin GPU brings up to 288 GB of HBM4 (1.6x more than B200’s HBM3e) and 22 TB/s of memory bandwidth. The Vera CPU adds 1.5 TB of LPDDR5x SOCAMM memory at 1.2 TB/s, giving the superchip a unified, coherent memory pool that spans GPU and CPU domains.

Vera CPU: Custom Arm “Olympus”

Vera is NVIDIA’s first fully custom CPU core, branded Olympus. The published spec is 88 cores, 176 threads via NVIDIA Spatial Multi-Threading (SMT-2 with hardware-managed thread affinity for cache locality), 227 billion transistors, and support for SOCAMM-based LPDDR5x. Vera is purpose-built for the host workload pattern of AI factories: high memory bandwidth, low latency to GPUs, and predictable per-core performance for orchestration code.

The Rack: 72 GPUs as One

Component	Quantity / Spec
Rubin GPUs	72
Vera CPUs	36
Total HBM4	20.7 TB
HBM Bandwidth	1.6 PB/s aggregate
Total LPDDR5x	54 TB
NVFP4 Inference	3.6 EFLOPS
NVFP4 Training	2.5 EFLOPS
Scale-Up Bandwidth	260 TB/s (NVLink 6)
Form Factor	MGX gen-3, liquid-cooled

NVLink 6 Switch Fabric

NVLink 6 is the on-rack fabric that turns 72 GPUs into a single accelerator. Each GPU gets 3.6 TB/s of bidirectional NVLink bandwidth, a step up from NVLink 5’s 1.8 TB/s. At rack level the aggregate is 260 TB/s, which matters most for collective operations during training and for tensor-parallel inference of very large MoE models.

Networking: Scale Out Without Bottlenecks

Rubin pairs with NVIDIA’s latest networking generation. ConnectX-9 SuperNICs handle east-west GPU-to-GPU traffic across racks, while BlueField-4 DPUs manage infrastructure services. For scale-out the customer picks Quantum-X800 InfiniBand (for tightly-coupled HPC and largest-scale training) or Spectrum-X Ethernet (for AI clouds standardized on Ethernet).

Power and Cooling Reality

NVL72 is a liquid-cooled platform end to end. Practical implications:

Rack power density exceeds 120 kW; legacy data center pods need targeted retrofits.
Direct-to-chip liquid cooling is required; air cooling is not an option for full-rack Rubin.
Coolant distribution units (CDUs) become first-class infrastructure, not afterthoughts.
Plan for service access, leak detection, and water quality management.

Software Stack

Rubin runs the same NVIDIA AI software stack you already know, CUDA, cuDNN, NCCL, TensorRT-LLM, NeMo, NIM, Triton, with new code paths that exploit NVFP4 and NVLink 6 collectives. NVIDIA AI Enterprise covers the production support tier with 9-year API stability.

Comparing to GB300 NVL72

The Blackwell GB300 NVL72 remains a remarkable platform, but on apples-to-apples MoE training NVIDIA cites Rubin as 3.5x faster and Rubin inference as 5x higher throughput. For workloads dominated by long-context prefill, adding Rubin CPX nodes amplifies the advantage further. The migration calculus comes down to your workload mix, the age of your existing footprint, and your facility’s readiness.

Who Should Buy Now?

Rubin NVL72 is the right platform if you are:

Standing up a new AI factory with capacity coming online in late 2026 or 2027
Training trillion-parameter foundation models or large MoE systems
Operating an inference fleet bottlenecked on long-context prefill
Building sovereign AI infrastructure on a multi-year horizon

If you’re already mid-deployment on Blackwell, finish that buildout, Rubin will be there when expansion comes.

Want help architecting a Rubin NVL72 rollout? Browse our Vera Rubin NVL72 product page or contact our team for a custom configuration proposal.