All News | Boards | Chips | Devices | Software | Archive | About | Contact | Subscribe
Follow LinuxGizmos:
Twitter Facebook Pinterest RSS feed
*   get email updates   *

Nvidia launches TAO software for easier AI model training

Apr 12, 2021 — by Eric Brown — 551 views

Nvidia unveiled a GUI-based “TAO” framework that eases AI model training for GPU-equipped platforms such as Jetson. There is also an upcoming “Grace” CPU plus improvements to Nvidia’s Jarvis voice agent, Maxine video SDK, and other AI tools.

At GTC 2021 today, Nvidia unleashed a torrent of announcements including the unveiling of an Nvidia TAO (Train, Adapt, and Optimize) framework for speeding AI development. Here we will focus on TAO along with brief examinations of improvements to Nvidia’s Jarvis voice technology, Maxine video SDK, DeepStream video analytics, and Merlin deep learning recommender system.

The main event at GTC was the unveiling of an AI-enhanced, server-oriented Grace CPU based on next-gen Arm Neoverse IP. By the time it arrives in 2023, Nvidia should have completed its pending, $40 billion acquisition of Arm.

Nvidia Grace prototype
(click image to enlarge)

The 7nm or 5nm fabricated Grace platform is not the first Nvidia CPU — Jetson modules are based on homegrown, Arm-based Tegra CPUs — but it will likely represent the first branded Nvidia CPU sold to third parties. Grace could also prove to be a major competitor to Intel Core and Xeon chips aimed at the enterprise.

Other Nvidia news includes a server-oriented AI-on-5G technology and an agreement with Arm manufacturers such as Marvell and MediaTek to incorporate Nvidia GPUs on some of their high-end Arm SoCs. Like TAO, all these products are based on or support Linux.

Nvidia TAO and Transfer Learning Toolkit

The Nvidia TAO (Train, Adapt, and Optimize) Framework, which is still in early access status, is a GUI-based, workflow-driven framework designed to ease the creation of enterprise AI applications and services. The software lets user fine-tune pretrained models downloaded from Nvidia’s NGC catalog for speech, vision, natural language understanding, and more.

The free NGC models are designed for Nvidia Tensor Core GPUs, which include the Maxwell, Pascal, and Volta GPUs found on Nvidia’s Linux-driven Jetson modules. It also supports higher end T4 Nvidia graphics cards that are increasingly deployed on typically Intel Core based edge AI systems and even higher-end Ampere GPUs.

Nvidia TAO conceptual diagram (left) and performance metrics (in frames per second) for common NGC models on Jetson Nano, Xavier NX, and AGX Xavier plus T4 and A100 (Ampere) platforms
(click images to enlarge)

Nvidia TAO lets developers produce domain specific models in hours rather than months, “eliminating the need for large training runs and deep AI expertise,” claims Nvidia. TAO is said to reduce time-consuming tasks within deep learning workflow, including data preparation, training, and optimization.

Nvidia TAO is built around a Transfer Learning Toolkit (TLT) that “abstracts away the AI and deep learning framework complexity and enables you to build production-quality pre-trained models faster with no coding required,” says Nvidia. TLT uses a technique called transfer learning that extracts learned features from an existing neural network model to a new one. Users supply small datasets, which TLT then pairs to the closest model in the catalog to flesh it out.

Transfer Learning Toolkit architecture
(click image to enlarge)

The time savings from using pre-trained models can be considerable. For example, Nvidia’s computer vision model represents 3,700 person-years spent labeling 500 million objects from 45 million frames. Some of the NGC models include credentials that certify the domain the model was trained for, the dataset that trained it, how often the model was deployed, and how it is expected to perform.

TAO also provides federating learning technology to enable different sites to securely collaborate to refine a model while keeping the datasets private. Recently, 20 research sites used the technology to collaborate on raising the accuracy of an “EXAM” model that predicts whether a patient has Covid-19. The federated learning capability extended this model by enabling the prediction of the severity of the infection and whether the patient would need supplemental oxygen.

TensorRT 8.0, Triton Inference Server 2.9, and Fleet Command

Nvidia also announced improvements to several technologies that can be tightly integrated with TAO and its TLT SDK to further optimize and deploy models. For example, after fine-tuning models in TLT, TAO enables optimization for deployment via integration with TensorRT, Nvidia’s SDK for high-performance deep learning inference.

With TensorRT, you can ensure a model will “function efficiently on your target platform whether it’s an array of GPUs in a server or a Jetson-powered robot on the factory floor,” says Nvidia. TensorRT “dials a model’s mathematical coordinates” to an optimal balance of the smallest size with the highest accuracy for the target system.

Nvidia announced the release of TensorRT 8.0, which is claimed to run up to 2x faster with INT8 precision, with accuracy similar to FP32. There are also compiler optimizations for transformer-based networks like BERT, among other improvements.

TensorRT (left) and Triton Inference Server architecture diagrams
(click images to enlarge)

TAO also supports integration with Nvidia’s Triton Inference Server, now available in version 2.9. This open source inference serving software optimizes a model for deployment on the optimal framework. The optimization is based on user input of the model’s optimal deployment configuration, architecture, and other information.

New Triton 2.9 features include an alpha-stage Model Navigator tool that converts TensorFlow and PyTorch models to TensorRT, as well as beta support for Intel’s OpenVINO backend. The Model Analyzer tool now automatically determines optimal batch size and number of concurrent model instances.

Once the framework is chosen, TAO enables users to launch Nvidia Fleet Command to deploy and manage the AI application across different GPU-powered devices. Fleet Command works with Nvidia-Certified servers via a browser interface to “securely pair, orchestrate and manage millions of servers, deploy AI to any remote location and update software as needed,” says Nvidia.


Using Nvidia Tao in an industrial environment

Nvidia Jarvis

Nvidia’s Jarvis voice platform entered 1.0 beta in late February and will be available for public download late this quarter. The new release provides conversational AI capabilities including “highly accurate automatic speech recognition, real-time translation for multiple languages, and text-to-speech capabilities to create expressive conversational AI agents,” says Nvidia.

Jarvis 1.0’s “Out-Of-The-Box” speech recognition model now offers greater than 90 percent accuracy, claims Nvidia. The model can be fine-tuned with TAO’s TLT. The real-time translation capability supports five languages and can run at run under 100ms latency per sentence.

Nvidia Maxine, AI Face Codec, and DeepStream 6.0

Nvidia’s previously revealed Maxine SDK for virtual collaboration and content creation applications is now available for download. Designed for applications such as video conferencing and live streaming, Maxine can integrate with Jarvis voice features.

The GPU-powered Maxine SDK includes a Video Effects SDK for super resolution, video noise removal, and virtual backgrounds. An Augmented Reality SDK provides 3D effects such as face tracking and body pose estimation. There is also an Audio Effects SDK that enables high quality noise removal and room echo removal.

In related news, Nvidia Research announced an AI Face Codec that compresses videos and renders human faces for video conferencing. AI Face Codec can deliver up to 10x reduction in bandwidth vs H.264, claims Nvidia.

While Maxine helps create collaborative video applications, the Nvidia DeepStream SDK helps analyze video via streaming analytics. Nvidia announced DeepStream 6.0, which adds a GUI interface and streamlines workflow “from prototyping to deployment across the edge and cloud.”

Nvidia Merlin

The latest open beta release of the Nvidia Merlin application framework for deep learning recommender systems is now available. The release
makes it easier to define workflows and training pipelines and improves support for inference and integration with Triton Inference Server. Other enhancements include the ability to scale transparently to larger datasets and more complex models.

Further information

Core parts of Nvidia TAO, including the Transfer Learning Toolkit and federated learning, are available today. More information and an application form for early access may be found on the Nvidia TAO product page.


(advertise here)

Print Friendly, PDF & Email

Please comment here...