Huawei CloudFabric 3.0 hyper-converged DCN solution enables lossless Ethernet, freeing up 100% of computing power





[Paris, France, April 7, 2022] Today, Zheng Xiaolong, Principal Investigator of Data Center Network (DCN), Huawei Canada Research Center, delivered a keynote speech titled “Zero-Packet-loss Ethernet Helps Release 100% Computing Power” at the MPLS, SD & AI Net World Congress . In the keynote address, Mr. Zheng discussed how Huawei’s CloudFabric 3.0 Hyper-Converged DCN solution provides an innovative solution to the packet loss problem on DCNs and builds Ethernets with low latency, high throughput and large scale to unleash 100% of the computing power . View of Huawei’s lead researcher on data center networks Improving compute efficiency is crucial in the data-centric compute era “Insufficient compute power is the biggest challenge in the data-centric compute era,” said Zheng Xiaolong. “To implement real-time data processing and monetization, robust computing power is required…

[Paris, France, April 7, 2022] Today, Zheng Xiaolong, Principal Investigator of Data Center Network (DCN), Huawei Canada Research Center, delivered a keynote speech titled “Zero-Packet-loss Ethernet Helps Release 100% Computing Power” at the MPLS, SD & AI Net World Congress . In the keynote address, Mr. Zheng discussed how Huawei’s CloudFabric 3.0 Hyper-Converged DCN solution provides an innovative solution to the packet loss problem on DCNs and builds Ethernets with low latency, high throughput and large scale to unleash 100% of the computing power .

View of Huawei’s Principal Investigator for Data Center Networks

Efficiently improving computing power is crucial in the age of data-centric computing power

“Insufficient computing power is the biggest challenge in the data-centric computing era,” said Zheng Xiaolong. “To implement real-time data processing and value generation, robust computing power is required.”

Today, big data is used everywhere from the metaverse and AI-powered drug research to user-habit-based intelligent ad recommendation. The key to such big data applications is robust computing power, but the scale of AI computing models is growing exponentially. For example, Megatron-Turing NLG — the industry’s newest language model — now supports 53 billion parameters. By comparison, even the most complex model in 2017 only supported 61 million parameters. In other words, the computational burden has increased nearly 10,000 times over the past five years. Obviously, finding a way to efficiently improve computing power and unleash 100% of computing power has become the top priority in the computing age.

DCNs become the main bottleneck for improving cluster computing power

Completing E-level floating-point computing operations required to train an AI model, such as the GPT3 language model, requires a large number of computer servers to form a cluster. However, all AI training clusters have their performance threshold. Once the threshold is reached, even if more server nodes are added, performance may not improve and may even deteriorate. This is because compute nodes work together in the cluster and, if packet loss occurs on the network, the overhead will increase due to the longer collaboration latency. Even with 0.1% packet loss, computing power is cut in half, making lossless DCN essential for improving computing power.

Lossless Ethernet Built on Huawei’s CloudFabric 3.0 Hyper-Converged DCN Solution, Unleashes 100% of the Computing Power

Huawei’s CloudFabric 3.0 Hyper-Converged DCN solution uses iLossless – a Huawei-unique intelligent and lossless algorithm – to eliminate packet loss that has hampered Ethernets for more than 4 decades. This solution offers high throughput, low latency and no packet loss, unleashing 100% of the compute power in all scenarios.

High throughput: Traditional traffic scheduling is manually configured and as such cannot adapt to dynamic network changes. Huawei’s Automatic ECN (ACC) is an intelligent and lossless technology that accurately predicts network congestion status and achieves nearly 100% throughput while eliminating packet loss on a congested link. As verified by Tolly Group, a global provider of testing and third-party validation and certification services, Huawei’s CloudFabric 3.0 Hyper-Converged DCN solution can increase all-flash IOPS performance by 93%. In August 2021, the paper ACC: Automatic ECN Tuning for High-Speed ​​Datacenter Networks explored Huawei’s intelligent and lossless hyper-converged DCN innovations, and was accepted by the Association for Computing Machinery (ACM)’s flagship annual event: the Special Interest Group on Data Communication (SIGCOMM) 2021. This shows that industry experts have great respect for Huawei’s innovations and that these innovations have a far-reaching impact that is felt around the world.

Low latency: In High Performance Computing (HPC) scenarios, application latency is the product of the number of computation steps and the latency of each step. For latency-sensitive applications, reducing the number of steps can effectively reduce overall application latency. Powered by in-network computing and topology-aware computing, Huawei’s Integrated Network and Computing (INC) technology implements network and computing collaboration. With these technologies, the network participates in the aggregation and synchronization of computer information, reducing the number of times computer information is synchronized. Meanwhile, computing tasks are assigned to the same TOR switch, reducing the number of communication hops, which in turn reduces application delay. Take MPI_allreduc as an example. Compared to traditional networks that only transmit data without participating in computing activities, the CloudFabric 3.0 Hyper-Converged DCN solution can dramatically reduce latency and improve computing efficiency by 27%.

Big scale: A data center’s traditional three-layer Clos network architecture supports up to 65,000 nodes, far less than required for large-scale data centers. Huawei’s CloudFabric 3.0 Hyper-Converged DCN solution uses next-generation direct connection architecture and innovative distributed adaptive routing protocols. Not only does it build a lossless computer network, but it also supports large-scale networks of up to 270,000 nodes, four times more than in the industry. This makes it ideal for large and ultra-large E-level and 10E-level computer hubs.

Zero packet loss and continuous performance evolution are of great importance for the data-centric computing age. Huawei has conducted large-scale joint testing with customers in the financial, manufacturing and HPC sectors. The test results prove that Huawei’s CloudFabric 3.0 Hyper-Converged DCN solution offers significant performance benefits in scenarios such as all-flash, distributed storage, HPC and AI computing. Going forward, Huawei will continue to invest in intelligent and lossless technology research to further enhance lossless network capabilities, fully unleash computing power and enable intelligent enterprise upgrades.

About Huawei

Huawei is a leading global provider of information and communication technology (ICT) infrastructure and smart devices. With integrated solutions in four key domains – telecom networks, IT, smart devices and cloud services – we are committed to bringing digital to every person, home and organization for a fully connected, intelligent world. Huawei’s end-to-end portfolio of products, solutions and services is both competitive and secure. Through open collaboration with ecosystem partners, we create lasting value for our customers, empower people, enrich family life and inspire innovation in organizations of all shapes and sizes. At Huawei, innovation is focused on customer needs. We invest heavily in fundamental research, concentrating on technological breakthroughs that help the world move forward. We have more than 197,000 employees and operate in more than 170 countries and regions. Founded in 1987, Huawei is a private company wholly owned by its employees. For more information, visit Huawei online at www.huawei.com or follow us on:

http://www.linkedin.com/company/Huawei

http://www.twitter.com/Huawei

http://www.facebook.com/Huawei

http://www.youtube.com/Huawei




Leave a Comment

x