NVIDIA Rubin CPX
A new class of NVIDIA GPU purpose-built for massive-context inference. Rubin CPX accelerates million-token reasoning, video understanding, and long-context agents with disaggregated inference design.
🚀 Express Shipping Available Across Europe & MENA
- Full Insurance on All Shipments
- Tracked Delivery & Real-Time Updates
Overview
NVIDIA Rubin CPX is a specialized member of the Rubin family designed exclusively for the context-processing phase of large language model inference. By disaggregating the prefill (context) stage from the decode stage, Rubin CPX delivers dramatically higher throughput for million-token prompts, long-context agents, and video understanding.
Rubin CPX pairs with standard Rubin GPUs in the same rack: CPX nodes ingest and tokenize huge contexts; standard Rubin GPUs handle decode. The result is a more efficient, more profitable inference factory for modern reasoning workloads.
Key Features
- Massive Context Optimization: Architectural focus on attention prefill at million-token scale.
- Disaggregated Inference: Separates context and generation stages for higher utilization.
- HBM4 Memory: High-capacity, high-bandwidth memory tier for long-sequence states.
- NVLink 6 Compatible: Drops into Vera Rubin racks alongside standard Rubin GPUs.
- NVFP4 Precision: Native support for NVIDIA’s next-gen 4-bit floating point.
Technical Specifications
| Specification | Details |
|---|---|
| Architecture | NVIDIA Rubin (CPX variant) |
| Memory | HBM4 (capacity per partner configuration) |
| Precision Support | NVFP4, FP8, BF16 |
| Interconnect | NVLink 6, PCIe Gen 6 |
| Target Workload | Context prefill / massive-context inference |
| Form Factor | SXM / rack-integrated |
Ideal Use Cases
- Million-token coding agents and long-document reasoning
- Video and multimodal understanding at production scale
- Retrieval-augmented generation with very large context windows
- Disaggregated inference factories optimizing TCO per token
Why Choose Rubin CPX?
If your inference workloads are dominated by long-context prefill, dedicating standard Rubin GPUs to decode while CPX handles context can deliver step-function improvements in tokens-per-dollar. We help you size CPX-to-Rubin ratios for your traffic mix and integrate with your serving stack.
Interested? Contact us for personalized pricing and configuration options.






Reviews
There are no reviews yet.