Date: 03/10/25

InfiniBand vs Ethernet

 

Which is the best standard for AI?

Dr. Carlos Berto Director of Network Engineering

Brian Chang Technical Writer/Editor

 

Network connectivity continues to be a major bottleneck for AI data centers as up to 33% of elapsed time in AI/ML tasks is often wasted waiting for network availability, resulting in costly GPU resources remaining idle.

With AI network requirements on an upwards trajectory, choosing the right standard between InfiniBand and Ethernet for your data center architectures is one of the first steps towards optimizing network performance and helping AI reach its full potential.

 

Ethernet vs InfiniBand

As a near ubiquitous network communications standard, Ethernet should be a familiar standard for network operators. Ethernet facilitates fast, reliable and secure data transfer between millions of wired or wireless network devices in Local Area Networks (LAN) and Wide Area Networks (WAN) worldwide.

Even with its dominance in the network industry, the Ethernet standard has continued evolving instead of remaining stagnant in its development. New breakthroughs have greatly improved the standard's ability to support next generation networks.

 

What is InfiniBand?

At the opposite end of the spectrum is InfiniBand (IB), a network communications standard that was designed as a spiritual successor to Ethernet by upgrading and fixing many of the latter's perceived deficiencies.

InfiniBand is a lossless fabric with a unique topology that gives it inherent advantages over Ethernet. The newer network standard offers extremely high throughput and extremely low latency when used as an interconnect between network devices, servers and storage.

Although InfiniBand adoption has not been nearly as prevalent as that of Ethernet, its growing popularity in the data center coincides with the rise of AI.

 

Fundamental differences

The key differences between the two protocols are as follows:

InfiniBand: Dedicated switch fabric

Ethernet: Shared switch fabric

InfiniBand utilizes a switch fabric which allows multiple switches to connect network nodes and transfer data in parallel. This balances network traffic and improves the flow of data through the network. In contrast, the shared fabric of standard Ethernet utilizes a single communication channel to transfer data between different devices, which may limit efficiency, performance, and scalability.

 

Lossless vs Best effort

InfiniBand: Lossless fabric

Ethernet: Best effort network

One of the advantages that InfiniBand has over traditional Ethernet is how it handles network traffic congestion. InfiniBand is designed to be a lossless fabric, meaning that it doesn't drop packets.

InfiniBand utilizes link-level flow control which signals to the sender whenever there is congestion on the receiving end is congested. The sender then temporarily pauses data transmission to avoid dropping packets. This native characteristic improves the efficiency, efficacy, and data integrity for training AI models.

Standard ethernet-based networks, on the other hand, are best effort networks. This setup allows the sender to continue sending data even through congested traffic and overfilled buffers on the receiving side. This results in routinely dropped packets, which lowers overall network performance.

 

RDMA vs TCP/IP

InfiniBand: RDMA protocol

Ethernet: TCP/IP protocol

InfiniBand utilizes the RDMA (Remote Direct Memory Access) protocol, which allows network devices to initiate data transfer without having to involve the CPUs and operating systems. With direct access to the data from the onboard memory on each device, the network devices can cut down the communication time to speed up the data transfer process. This directly translates to lower latency and higher throughput.

In contrast, standard ethernet utilizes a TCP (Transmission control protocol) /IP (Internet protocol), which requires two devices to communicate with each other via the operating system in order to exchange data. This extra step slows down the data transfer process and adds latency to the network.

 

Ethernet upgrades

As mentioned beforehand, Ethernet has evolved over the years that allows it to level the playing field with InfiniBand. Some of the upgrades include RoCE (Remote Direct Memory Access over Converged Ethernet), which is designed to mimic RDMA, as well as PFC (Priority-based flow control), which pauses network traffic independently to avoid overbuffing and dropping packets.

 

Comparison criteria

Several criteria to consider when choosing between InfiniBand and Ethernet include:

 

Costs

The costs of InfiniBand tend to be higher than that of Ethernet on average because InfiniBand requires specialized hardware/equipment to be installed. The hardware is often proprietary and vendor-locked, which can limit the availability of hardware and increase costs.

Despite this, InfiniBand is potentially more cost-effective in the long run. InfiniBand can perform at a higher baseline level and has higher scalability. Less frequent major upgrades are needed over time to accommodate for increasing workloads in AI, helping businesses save money on retooling for future AI applications/workloads.

 

Axiom for AI networking

Taking into account the aforementioned factors can help businesses figure out which type of standard and approach are most optimal for the data center. Both InfiniBand and Ethernet can be great options in the right environments, it just depends on organizational needs and resources. Axiom supports both InfiniBand and Ethernet deployments to help businesses build a more robust AI data center.

Axiom offers high-speed transceivers, cables, and network equipment engineered for seamless integration into a ML cluster or AI super basepod. From 800G QSFP+/OSFP to 1.6T OSFP transceivers with riding and integrated heatsinks to LPO powered optics that can hit the benchmarks for a low power consumption in a high-performance AI data center, Axiom transceivers are ready for deployment in an AI data center infrastructure.

Power AI data centers with Axiom InfiniBand and Ethernet solutions. Learn more about our Ethernet and InfiniBand EDR/HDR/NDR transceivers as well as Breakout DAC, DAC/AOC options today, contact our team

1 2022 OCP Keynote

About the Author

Carlos Berto
Director of Network Engineering, Axiom

Dr. Carlos Berto leads Axiom’s Network Engineering team, working directly with enterprise and hyperscale data centers on real-world deployment challenges across optical, memory, and interconnect infrastructure.

With over 25 years in telecommunications and data infrastructure, he has been involved in the design, validation, and troubleshooting of high-speed systems from early 10G networks through today’s 400G, 800G, and emerging 1.6T environments.

His work focuses on where systems fail outside controlled lab conditions signal integrity breakdowns, thermal constraints, and power delivery instability in production environments particularly in AI and HPC deployments.

Dr. Berto holds a Ph.D. in Engineering and contributes technical insights that translate field experience into practical guidance for engineering teams responsible for performance and reliability.

Focus Areas

  • Optical and Interconnect Systems (400G / 800G / 1.6T)
  • AI and HPC Infrastructure
  • Signal Integrity, Thermals, and Power Delivery

Connect

Connect with Carlos on LinkedIn
View all articles by Carlos Berto

Follow Inside The Stack:

Inside The Stack: Trends & Insights