AWS Adds Gaudi-Powered, ML-Optimized EC2 DL1 Instances, Now in GA – HPCwire

As machine learning becomes a dominating use case for local and cloud computing, companies are racing to provide solutions specifically optimized and accelerated for AI applications. Now, Amazon Web Services (AWS) is introducing a new competitor to the landscape: cloud instances powered by (Intel-owned) Habana’s Gaudi AI processors, marking the first AI training instances provided by AWS that are not GPU-based.

Intel acquired Habana – which was founded five years ago – in late 2019 for around $2 billion. Habana produces two AI accelerators: the “Goya” inference processor, which debuted in 2018 when Habana emerged from stealth, and the “Gaudi” training processor, which was announced in 2019, shortly before the acquisition by Intel.

Habana Labs’ HL-205 Gaudi mezzanine card. Image courtesy of Habana Labs.

It is this latter processor, Gaudi, that is at the heart of the new AWS EC2 DL1 instances. Each instance includes eight Gaudi accelerator cards, which each come equipped with a single Gaudi HL-2000 training processor with eight fully programmable tensor processing cores, fabricated on TSMC’s 16nm process. Each card also contains 32GB of HBM2 memory. Beyond the accelerators, the EC2 DL1 instances include 768GB of system memory, custom second-generation (Cascade Lake) Intel Xeon Scalable CPUs and 4TB of local NVMe storage. The instances are capable of 400Gbps of networking throughput.

In a company blog post, AWS Chief Evangelist Jeff Barr detailed some of the attributes of Gaudi, including its tensor processing cores (TPCs). These are “specialized VLIW SIMD (Very Long Instruction Word / Single Instruction Multiple Data) processing units designed for ML training,” wrote Barr. “The TPCs are C-programmable, although most users will use higher-level tools and frameworks.”

The chip supports several data types, including floating point (BF16 and FP32), signed integer (INT8, INT16, and INT32), and unsigned integer (UINT8, UINT16, and UINT32).

It also has specialized hardware – Generalized Matrix Multiplier Engine (GEMM) – to accelerate matrix multiplication.

AWS estimates that these instances will provide “up to 40 percent” better price performance as compared to the latest GPU-powered Amazon EC2 instances when training a typical machine learning model. (The latest Amazon EC2 GPU instance, the P4, includes up to eight Nvidia A100 GPUs and more system memory – 1,152GB per instance). Part of this price performance is likely the apparently dramatic difference in costs per hour: AWS estimates an EC2 P4 instance at $32.77/hour on-demand and an EC2 DL1 instance at just $13.11/hour on-demand.

“The use of machine learning has skyrocketed. One of the challenges with training machine learning models, however, is that it is computationally intensive and can get expensive as customers refine and retrain their models,” said David Brown, vice president for Amazon EC2 at AWS. “AWS already has the broadest choice of powerful compute for any machine learning project or application. The addition of DL1 instances featuring Gaudi accelerators provides the most cost-effective alternative to GPU-based instances in the cloud to date. Their optimal combination of price and performance makes it possible for customers to reduce the cost to train, train more models, and innovate faster.”

Of course, working with these ostensibly more cost-efficient instances for AI training will require some adjustment from developers. AWS provides customers with access to the Habana SynapseAI SDK, which it says is integrated with frameworks like TensorFlow and PyTorch in a manner that will require “minimal code changes” for machine learning model migration. There are also Gaudi-optimized reference models available.

For the release, AWS highlighted a series of high-profile customers, including Seagate, Leidos, Fractal – and, of course, Habana’s owner, Intel, which is planning to use the EC2 DL1 instances to train its 3D athlete tracking technology. “Training our models on Amazon EC2 DL1 instances, powered by Gaudi accelerators from Habana Labs, will enable us to accurately and reliably process thousands of videos and generate associated performance data, while lowering training cost,” said Rick Echevarria, vice president for Intel’s sales and marketing group. “With DL1 instances, we can now train at the speed and cost required to productively serve athletes, teams, and broadcasters of all levels across a variety of sports.”

The DL1 instances are available as on-demand instances, reserved instances, spot instances and more – but, for now, they’re only available in the US East (Northern Virginia) and US West (Oregon) regions on AWS.

In a separate blog post, the Habana team revealed that they are working on the next generation Gaudi2 AI processor with a slimmer 7nm node, “further improving the price-performance for the benefit of our end-customers, while maintaining the same architecture and fully leveraging the same SynapseAI software and ecosystem we are building with Gaudi.”

Related Items

Habana’s AI Silicon Comes to San Diego Supercomputer Center

Intel Seeks AI Inferencing Edge with Habana Acquisition

Spread the love

Leave a Reply

Your email address will not be published.