Nvidia Corp. is pushing back against rivals trying to steal its crown in the artificial intelligence market with a number of updates that should provide a big boost to the processing power, speed and flexibility of its inference software.
Today’s updates, announced at Nvidia’s GTC 2021 developer conference, include new capabilities within the Nvidia Triton Inference Server software as well as key enhancements to Nvidia’s TensorRT software. The company also announced a new, low-powered and smaller footprint graphics processing unit called the Nvidia A2 Tensor Core GPU that’s designed to accelerate AI inference at the edge.
The Triton Inference Server and TensorRT software are key elements of the Nvidia AI Enterprise platform that was launched this year and makes it easy to run AI workloads on VMware vSphere in on-premises data centers and private clouds. Triton provides cross-platform inference on any kind of AI model or framework, while TensorRT is software that optimizes AI models and provides a runtime platform for high-performance inference using Nvidia’s GPUs.
Triton also provides AI inference on GPUs and central processing units hosted in the cloud, private data centers or at the network edge, and is integrated with Amazon Web Services, Google Cloud, Microsoft Azure and other platforms.
Inference is an important aspect of AI. Whereas AI training refers to the development of an algorithm’s ability to understand a dataset, inference regards its ability to act on that information and infer answers to specific queries.
If it’s going to be useful in the real world, AI needs to be able to infer quickly. And that becomes all the more important as applications become more complex and deal with ever-growing amounts of data.
With today’s update, Triton gains a new Model Analyzer that automates optimization by selecting the best configuration for each AI model for the task in hand, from hundreds of possibilities. The Multi-GPU Multinode Functionality meanwhile makes it possible for large, Transformer-based language models that no longer fit on a single GPU to be inferenced using multiple GPUs and server nodes.
Another new capability includes a new backend for GPU and CPU inference that consists of “random forest” machine learning method and gradient-boosted decision tree models. It gives developers a unified deployment engine, meaning Triton can now be used for both traditional machine learning models and deep learning models.
In addition, Triton adds support for Amazon SageMaker integration, meaning it can be used with AWS’ fully managed AI service. There’s also support for AI inference workloads on Arm Ltd.’s CPU designs, as well as Nvidia’s.
As for TensorRT, version 8.2 is now integrated with the open-source TensorFlow and PyTorch AI frameworks, Nvidia said. This integration means it can deliver up to three times faster performance versus inference in-framework with just one line of code, Nvidia explained.
The Nvidia A2 Tensor Core GPU, meanwhile, is available now in numerous new Nvidia-certified systems that are guaranteed to meet the company’s design best practices and deliver optimal performance for AI inference. Nvidia said the A2 Tensor Core GPU is designed to run at the edge and can deliver up to 20 times more inference performance than traditional CPUs.
The A2 chip is an entry-level chip, as Nvidia offers far more AI inference processing power through its existing A30 and A100 GPUs.
That kind of flexibility can only be a good thing, though, as Nvidia’s dominance of the AI inference segment is under threat from rival chipmaker Advanced Micro Devices Inc., which has some very powerful hardware and inference platforms of its own.
On Monday, AMD announced it had signed up Facebook Inc.’s parent company Meta Platforms Inc. as a data center chip customer. At the same time, it also announced a range of new chips aimed at taking on larger rivals in the high-performance computing and AI businesses. They include the AMD MI200, an “accelerator” that aims to speed up machine learning and AI workloads. It’s positioned as a direct rival to Nvidia’s most powerful A100 chip.
That said, Nvidia has landed plenty of new customers itself, including the likes of Microsoft Corp., which runs Triton and Nvidia’s GPUs in its Azure cloud, and Siemens AG, whose affiliate Siemens Energy is now using Triton to help manage its customers’ power plants with AI.
Constellation Research Inc. analyst Holger Muller said Nvidia’s battle for cloud AI is taking place on two fronts, with both the GPU and CPU hardware and also the software that manages those platforms, feeding the right workloads to each kind of processor.
“Nvidia faces a challenge from AMD and Intel, which might be on the ropes but can never be counted out,” Mueller said. “Today Nvidia is making progress on the latter front, updating its Triton Inference Server software with a new model analyzer that helps to figure out how and where to deploy AI models. The new A2 Tensor Core chip looks like a good option for powering local AI inference.”
Ian Buck, vice president and general manager of Nvidia’s Accelerated Computing, insisted the company’s AI inference platform is “driving breakthroughs across virtually every industry.”
“Whether delivering smarter recommendations, harnessing the power of conversational AI, or advancing scientific discovery, Nvidia’s platform for inference provides low-latency, high-throughput, versatile performance with the ease of use required to power key new AI applications worldwide,” he said.
Nvidia said the latest version of Triton Inference Server is available now from the Nvidia NGCTM catalog and the Triton GitHub repository. TensorRT 8.2 is available now to members of the Nvidia Developer program and the Tensor RT GitHub repo, and both are also available through Nvidia AI Enterprise offering.