NVIDIA Rubin CPX

A new class of NVIDIA GPU purpose-built for massive-context inference. Rubin CPX accelerates million-token reasoning, video understanding, and long-context agents with disaggregated inference design.

🚀 Express Shipping Available Across Europe & MENA

  • Check Mark Full Insurance on All Shipments
  • Check Mark Tracked Delivery & Real-Time Updates
GUARANTEED SAFE CHECKOUT
  • Stripe
  • Visa Card
  • MasterCard
  • American Express
  • Discover Card

Overview

NVIDIA Rubin CPX is a specialized member of the Rubin family designed exclusively for the context-processing phase of large language model inference. By disaggregating the prefill (context) stage from the decode stage, Rubin CPX delivers dramatically higher throughput for million-token prompts, long-context agents, and video understanding.

Rubin CPX pairs with standard Rubin GPUs in the same rack: CPX nodes ingest and tokenize huge contexts; standard Rubin GPUs handle decode. The result is a more efficient, more profitable inference factory for modern reasoning workloads.

Key Features

  • Massive Context Optimization: Architectural focus on attention prefill at million-token scale.
  • Disaggregated Inference: Separates context and generation stages for higher utilization.
  • HBM4 Memory: High-capacity, high-bandwidth memory tier for long-sequence states.
  • NVLink 6 Compatible: Drops into Vera Rubin racks alongside standard Rubin GPUs.
  • NVFP4 Precision: Native support for NVIDIA’s next-gen 4-bit floating point.

Technical Specifications

Specification Details
Architecture NVIDIA Rubin (CPX variant)
Memory HBM4 (capacity per partner configuration)
Precision Support NVFP4, FP8, BF16
Interconnect NVLink 6, PCIe Gen 6
Target Workload Context prefill / massive-context inference
Form Factor SXM / rack-integrated

Ideal Use Cases

  • Million-token coding agents and long-document reasoning
  • Video and multimodal understanding at production scale
  • Retrieval-augmented generation with very large context windows
  • Disaggregated inference factories optimizing TCO per token

Why Choose Rubin CPX?

If your inference workloads are dominated by long-context prefill, dedicating standard Rubin GPUs to decode while CPX handles context can deliver step-function improvements in tokens-per-dollar. We help you size CPX-to-Rubin ratios for your traffic mix and integrate with your serving stack.

Interested? Contact us for personalized pricing and configuration options.

Reviews

There are no reviews yet.

Be the first to review “NVIDIA Rubin CPX”

Your email address will not be published. Required fields are marked *