InfiniBand NDR/XDR for AI and HPC Data Centers

September 26, 2025

InfiniBand technology continues to evolve as a cornerstone for high-performance computing (HPC), artificial intelligence (AI), and data center networking. With the advent of NDR (Next Data Rate) and XDR (eXtreme Data Rate) specifications, InfiniBand delivers unprecedented bandwidth, low latency, and scalability to meet the demands of modern workloads.

What is InfiniBand NDR/XDR Technology?

InfiniBand NDR operates at 400 Gb/s per port, utilizing PAM4 modulation and advanced SerDes technology to achieve high-speed data transfer. It supports configurations like 64 x 400 Gb/s ports or 128 x 200 Gb/s ports in switches, making it ideal for large-scale clusters. XDR advances this further, providing 800 Gb/s per port and enabling up to 1.6 Tb/s throughput in aggregated setups, with features like enhanced in-network computing for AI acceleration.

NDR builds on previous generations like HDR (200 Gb/s), doubling speeds and improving efficiency, while XDR represents the next leap, targeting exascale computing and massive AI models by 2025 standards.

Key Benefits for AI and HPC Applications

NDR and XDR InfiniBand solutions offer ultra-low latency (sub-microsecond), high bandwidth for parallel processing, and in-network computing via technologies like NVIDIA SHARP, which accelerates collective operations by up to 32 times. These benefits are crucial for AI training, where data movement between GPUs must be seamless, reducing bottlenecks in large-scale models. In HPC, they enable simulations with over a million nodes, supporting applications in climate modeling, drug discovery, and scientific research. Additionally, features like self-healing networks and adaptive routing ensure reliability, while power efficiency improvements make them suitable for sustainable data centers.

InfiniBand Switches

Contents show

InfiniBand switches form the backbone of these networks, providing scalable interconnects for high-density environments.

NVIDIA Quantum™-2 InfiniBand Switch (NDR)

The NVIDIA Quantum-2 switch family, including models like MQM9790-NS2F and MQM9700-NS2F, delivers 64 x 400 Gb/s ports or 128 x 200 Gb/s ports in a compact 1U form factor using 32 OSFP connectors. It incorporates third-generation SHARP for in-network aggregation, MPI acceleration, and advanced congestion control. Compared to predecessors, it triples port density and quintuples system capacity, supporting Dragonfly+ topologies for networks exceeding one million nodes.

NVIDIA Quantum-X800 InfiniBand Switch (XDR)

The Quantum-X800 series, such as the Q3200-RA and Q3400-RA, offers 144 x 800 Gb/s ports over OSFP cages, powered by the Quantum-3 ASIC. It features fourth-generation SHARP v4, ultra-low latency, self-healing, and silicon photonics for enhanced performance. The dual-switch enclosure in models like Q3200-RA provides 72 x 800 Gb/s ports, ideal for AI clusters requiring 1.6 Tb/s aggregated bandwidth.

Optical Modules and Transceivers

Optical modules and transceivers are core components enabling high-speed, low-loss data transmission in InfiniBand networks. With the advancement of NDR and XDR technologies, these modules adopt cutting-edge modulation schemes (such as PAM4) and packaging formats (such as OSFP and OSFP-XD), supporting bandwidths from 400 Gb/s up to 1.6 Tb/s. They are well-suited for AI data centers and high-performance computing (HPC) environments. In recent years, innovations like silicon photonics and Linear-Drive Optics (LPO) have further reduced power consumption and costs while enhancing performance and integration.

OSFP-SR8-800G InfiniBand Optical Module Technology Overview (NDR/XDR)

The OSFP-SR8-800G is a dual-port 2×400 Gb/s (total 800 Gb/s) multimode module, utilizing 100G-PAM4 modulation across 8 channels. It supports transmission distances up to 50 meters over MPO-12/APC fibers and integrates dual optical engines for high-density switching. Leveraging silicon photonics technology, the module ensures low insertion loss and excellent signal-to-noise ratio, making it ideal for short-reach data center interconnects. In practical deployment, it is seamlessly compatible with NVIDIA Quantum-2 or Quantum-X800 switches, while supporting hot-pluggable operation and Digital Diagnostic Monitoring (DDM) functions for real-time monitoring of temperature, power, and signal integrity.

OSFP-SR8-800G

1.6T OSFP Optical Transceivers (XDR-Specific)

XDR-dedicated 1.6T OSFP transceivers, such as 2×DR4/DR8 variants, operate over 500m single-mode fiber at 1310nm, powered by Broadcom’s 5nm DSP to achieve ≤30W power consumption. They enable dual-port 800 Gb/s configurations, which are critical for 1.6 Tb/s links in AI data centers. These transceivers also support extended-reach variants such as FR4 (up to 2km) and LR4 (up to 10km), using CWDM or DWDM wavelength-division multiplexing to accommodate different data center scales. In addition, integrated LPO (Linear-Drive Optics) technology can further reduce power consumption to below 20W while maintaining low-latency performance, making them well-suited for large-scale AI training clusters.

1.6T OSFP-XD

InfiniBand OSFP-400G-SR4 Optical Module (NDR)

The OSFP-400G-SR4 is a single-port 400 Gb/s multimode transceiver for NDR, supporting up to 50m transmission over MPO-12 fiber. It is designed for switch-to-HCA connections, ensuring a low BER (bit error rate <10^-15) and full compatibility with the Quantum-2 ecosystem. In deployment, this module is often used for short-reach server-to-switch links and supports breakout cable configurations, such as 400G to 4×100G, for backward compatibility with EDR/HDR systems. Certified versions are available through NVIDIA’s LinkX series, ensuring zero compatibility issues and long-term stability.

OSFP-400G-SR4

Other Common InfiniBand Optical Module Types

OSFP-DR4-800G and FR4 Variants

The OSFP-DR4-800G module is designed for single-mode fiber, supporting 500m transmission at a 1310nm wavelength with PAM4 modulation, making it suitable for medium-reach data center interconnects. The FR4 variant extends the reach up to 2km, supports CWDM multiplexing, and keeps power consumption under 25W. These modules are particularly useful in XDR environments to handle data traffic across large-scale GPU clusters.

Silicon Photonics Integrated Modules

Silicon photonics technology was a highlight at the 2025 OFC Conference, enabling significant reductions in module size and cost by integrating lasers, modulators, and detectors onto a silicon chip. For example, NVIDIA’s latest silicon photonics transceivers achieve optimized power consumption at 800G transmission and support Co-Packaged Optics (CPO) architectures, further shortening signal paths and reducing latency. Such modules are ideal for AI accelerators like DGX systems, offering higher integration density.

LPO and CPO Innovations

Linear-Drive Optics (LPO) transceivers eliminate traditional DSPs and rely on host-side signal processing, further reducing power consumption to below 15W and accelerating adoption in NDR/XDR deployments. Co-Packaged Optics (CPO) integrates optical engines directly alongside ASICs and is expected to become mainstream after 2025, targeting exascale computing. These innovations address the thermal management and power challenges of AI workloads.

Selection Considerations for Optical Modules

When choosing InfiniBand optical modules, several factors need to be taken into account, including transmission distance, power budget, compatibility, and cost. Short-reach SR modules are ideal for in-rack connections, while DR/FR modules are better suited for inter-rack or inter-floor connectivity. It is also essential to ensure that modules comply with NVIDIA’s interconnect requirements, including cable types and BER testing.

Furthermore, trends in 2025 indicate that growing AI demand is driving supply chain optimization for optical modules, with production capacity expected to double in order to meet the expansion of global data centers.

Typical Application Scenarios of NDR/XDR Optical Connectivity

(1) In-Rack GPU Node Interconnection

Solution: DAC (Direct Attach Copper)

Features: Ultra-low latency, 0–3m reach, and low power consumption; ideal for direct connections between GPUs and switches within the same rack.

(2) Cross-Rack GPU Cluster Interconnection

Solution: AOC (Active Optical Cable)

Features: Supports 10–100m reach, flexible cabling; suitable for GPU cluster deployments across adjacent or multiple racks.

(3) Intra-Data Center TOR/Leaf-Spine Networking

Solution: Multimode/Single-mode Optical Patch Cords

Features: Supports 100–500m transmission, with typical breakout rates such as 800G to 4×200G or 400G to 4×100G; widely used for switch-to-switch interconnects or room-to-room connections.

(4) AI Supercomputing Backbone Interconnect

Solution: Single-mode Fiber with Long-Reach Optical Modules (e.g., DR4, FR4, LR4)

Features: Covers 500m–10km distances, ensuring high-speed connectivity for large-scale GPU clusters and multi-floor/campus-wide AI factories.

Typical Connectivity Solution Examples

800G GPU to 4×200G HCA Connection: One 800G optical patch cord split into 4×200G channels via breakout cables, enabling a single GPU to connect to multiple servers.

Switch Spine–Leaf Architecture Interconnect: Using OSFP-800G DR4/FR4 modules with single-mode fibers to achieve 500m–2km Spine–Leaf connections, ensuring low latency and high bandwidth for large-scale clusters.

High-Density In-Rack Direct Attach: 1.5m DAC cables for GPU-to-GPU or GPU-to-switch links within the same rack, delivering high bandwidth with minimal latency.

Future Outlook

As AI factories and exascale supercomputing continue to expand, the demand for ultra-high bandwidth and energy-efficient interconnects will intensify. Silicon photonics, Co-Packaged Optics (CPO), and Linear-Drive Optics (LPO) are expected to play a pivotal role in addressing power and thermal challenges while enabling higher integration density.

Looking ahead, InfiniBand will remain at the core of next-generation data centers, evolving from NDR to XDR and eventually to GDR (1.6 Tb/s). This progression will not only support massive AI training clusters but also drive innovations in in-network computing, real-time data analytics, and HPC workloads at unprecedented scales.

Post Views: 4,751