RDMA over Converged Ethernet (RoCE)



I didn’t know anything about RoCE till weeks ago when a sales engineer told me about this technology. It’s amazing. Actually, I’m studying these days how to configure RoCE and I will end up installing and deploying this technology. However, I’ve realised RoCE uses the Data Center Bridging (DCB) standard, which has features such as Priority-based Flow Control (PFC), Enhanced Transmission Selection (ETS), Data Center Bridging Capabilities Exchange Protocol (DCBX) and Congestion Notification. All of them useful for RoCE.

If we want to understand RoCE, firstly, we should know about InfiniBand. The first time I heard about InfiniBand was two or three years ago when Ariadnex worked for CenitS in a project of supercomputing. They have 14 Infiniband Mellanox SX6036 switches with 36 56Gbps FDR ports and 3 InfiniBand Mellanox IS5030 switches with 36 QDR ports 40Gbps for computing network. Therefore, we will see most InfiniBand networks in High-Performance Computing (HPC) systems because HPC systems require very high throughput and very low latency.

CenitS Lusitania II

RoCE stands for RDMA over Converged Ethernet and RDMA stands for Remote Direct Memory Access. This last technology, RDMA, was only known in the InfiniBand community but, lately, it’s increasingly known because we can also enable RDMA over Ethernet networks which is a great advantage because we can achieve high throughput and low latency. Thanks to RDMA over Converged Ethernet (RoCE), servers can send data from the source application to the destination application directly, which increases considerably the network performance.

RDMA over Converged Ethernet (RoCE)
 
Clustering, Hyper-Convergence Infrastructure (HCI) and Storage solutions can benefit from performance improvements provided by RoCE. For instance, Hyper-V deployments are able to use SMB 3.0 with the SMB Direct feature, which can be combined with RoCE adapters for fast and efficient storage access, minimal CPU utilization for I/O processing, and high throughput with low latency. What’s more, iSCSI extensions for RDMA, such as iSER, and NFS over RDMA are able to increase I/O operations per second (IOPS), lower latency and reduced client and server CPU consumption.

RDMA support in vSphere

In addition to RoCE and InfiniBand, the Internet Wide Area RDMA Protocol (iWARP) is another option for high throughput and low latency. However, this protocol is less used than RoCE and InfiniBand. In fact, iWARP is no longer supported in new Intel NICs and the latest Ethernet speeds of 25, 50 and 100 Gbps are not available for iWARP. This protocol uses TCP/IP to deliver reliable services, while RoCE uses UDP/IP and DCB for congestion and flow control. Furthermore, I think it's important to highlight that these technologies are not compatible with each other. I mean, iWARP adapters can only communicate with iWARP adapters, RoCE adapters can only communicate with RoCE adapters and InfiniBand adapters can only communicate with InfiniBand adapters. Thus, if there is an interoperability conflict, applications will revert to TCP without the benefits of RDMA.

RoCE and iWARP Comparison

To sum up, RDMA was only used for High-Performance Computing (HPC) systems with InfiniBand networks but thanks to converged Ethernet networks, and protocols such as RoCE and iWARP, today, we can also install clusters, Hyper-Convergence Infrastructures (HCI) and storage solutions with high throughput and low latency in the traditional Ethernet network.

Keep reading and keep studying!!

Commentaires