NVIDIA Rubin Platform

NVIDIA Vera Rubin Platform: Six New Chips, One AI Supercomputer

At CES 2026, NVIDIA pulled the curtain on Vera Rubin, the next-generation platform that succeeds Blackwell as the foundation of the AI factory. Rubin is not a single chip; it is a coordinated system of six new pieces of silicon engineered to work as one. In this overview we map out what each chip does, how they fit together, and why this rewrites the rules for trillion-parameter training and agentic inference.

Why Vera Rubin Matters

Blackwell shipped in volume across 2024 and 2025, scaling AI factories into the gigawatt era. Rubin pushes that envelope further with a 4x reduction in GPUs required to train the same mixture-of-experts (MoE) model and a 5x lift in NVFP4 inference throughput at the rack level. The headline number for the flagship Vera Rubin NVL72 rack is 3.6 exaflops of NVFP4 inference and 2.5 exaflops of training, with 20.7 TB of HBM4 capacity and 1.6 PB/s of HBM bandwidth.

But hardware is only useful if the supporting fabric, CPUs, DPUs, and SuperNICs scale alongside. Rubin’s defining design choice is co-engineering all six chips at once.

The Six Chips

1. Rubin GPU

The headline accelerator. Rubin moves to HBM4 with up to 288 GB per GPU and 22 TB/s of memory bandwidth. It introduces native support for NVFP4, NVIDIA’s 4-bit floating point format, and connects to the rest of the rack via NVLink 6 at 3.6 TB/s of bidirectional bandwidth per GPU.

2. Vera CPU

NVIDIA’s first fully custom Arm CPU based on the new “Olympus” core. Vera packs 88 cores and 176 threads via NVIDIA Spatial Multi-Threading, supports up to 1.5 TB of LPDDR5x SOCAMM memory at 1.2 TB/s, and ships at 227 billion transistors. Vera is the host CPU paired with two Rubin GPUs in the Vera Rubin superchip.

3. Rubin CPX

A specialized Rubin variant tailored for the context-prefill phase of long-context inference. Rubin CPX disaggregates context processing from token generation, so a Vera Rubin rack can dedicate CPX nodes to ingesting million-token prompts while standard Rubin GPUs focus on decode. This is a structural answer to the rise of long-context agentic workloads.

4. ConnectX-9 SuperNIC

The endpoint NIC for Rubin. ConnectX-9 doubles the throughput of ConnectX-8 and is required to feed Rubin’s east-west traffic appetite. It supports both InfiniBand and Ethernet from a single ASIC.

5. BlueField-4 DPU

The next-generation infrastructure DPU. BlueField-4 increases bandwidth and adds Arm cores for offloading networking, security, and storage services from the host CPU. In Rubin racks BlueField-4 is the gatekeeper between the AI factory and everything outside it.

6. NVLink 6 Switch

The fabric inside the rack. NVLink 6 delivers 260 TB/s of scale-up bandwidth for the NVL72 configuration, making 72 GPUs behave as one. Without NVLink 6 the rest of Rubin would be bottlenecked.

Putting It Together: Vera Rubin NVL72

The reference platform for the Rubin era is the Vera Rubin NVL72 rack: 72 Rubin GPUs, 36 Vera CPUs, ConnectX-9 SuperNICs, BlueField-4 DPUs, NVLink 6 switches, and a choice of Quantum-X800 InfiniBand or Spectrum-X Ethernet for scale-out. NVIDIA’s flagship configuration sits inside a third-generation MGX modular reference design optimized for liquid cooling and serviceability.

For organizations standing up new AI factories in late 2026 and beyond, Rubin NVL72 is the platform to design around. Existing Blackwell GB300 NVL72 deployments are not going anywhere, but the migration path is now visible.

Availability

NVIDIA confirmed at CES 2026 that Rubin is in full production, ahead of the originally communicated H2 2026 timeline. Partner systems from Supermicro (Vera Rubin NVL144 and NVL144 CPX), Dell, HP, Lenovo, and others are scheduled for second-half 2026 delivery, and cloud providers AWS, Google Cloud, Microsoft, and OCI have confirmed Rubin instances will be among the first available.

What This Means for You

If you operate or plan to operate AI training or inference at scale, the questions to start asking now:

  • Is your facility sized for the power and cooling envelope of NVL72-class racks?
  • Is your scale-out fabric (Quantum-X800 or Spectrum-X) provisioned to feed Rubin’s bandwidth?
  • Does your inference workload mix benefit from disaggregated CPX architecture?
  • What’s the migration path from your current Hopper or Blackwell footprint?

We help organizations answer those questions and architect the right Rubin deployment. Contact us for a Rubin readiness consultation or browse our Vera Rubin NVL72 product page for deeper specs.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *