[Updated: Mar. 8] — Nvidia’s Jetson TX2 COM runs Linux4Tegra on a hexa-core Tegra Parker SoC with Pascal graphics, offering twice the performance and/or efficiency of the TX1.
Nvidia announced its third-generation Nvidia Jetson computer-on-module with claims of offering twice the performance in high-power mode or twice the power efficiency in low-power mode compared to the previous Tegra X1 based Jetson TX1. The Linux4Tegra-driven Jetson TX2 module is available Mar. 14 as part of a $599 developers kit ($299 for educational institutions), and will ship on its own in the second quarter for $399 in quantity. Nvidia also announced a new version 3.0 of its Linux-based JetPack SDK for its Jetson COMs.
Jetson TX2 with (left) and without thermal transfer plate
(click images to enlarge)
Nvidia’s announcement declined to name the Jetson TX2’s high powered SoC, which adds advanced 256-core Pascal graphics and two high-end “Denver 2” cores in addition to the four Cortex-A57 cores found on the Jetson TX1’s Tegra X1. However, an Nvidia blog post tags the SoC as the Tegra Parker, which Nvidia revealed in August of last year (see farther below).
The Jetson TX2 module is drop-in compatible with the TX1, and is designed for applications including intelligent factory robots, commercial drones, and smart cameras for AI cities, says Nvidia. Like the Jetson TX1, the Jetson TX2 measures 87 x 55mm, and communicates via a 400-pin connector to a Jetson development board, as detailed farther below.
Comparison of Jetson TX1 and TX2 module specs
(click images to enlarge; source: Nvidia)
Like the TX1, the TX2 offers a GbE controller, as well as 802.11ac WiFi and an unnamed version of Bluetooth. Other continuing features include support for USB 3.0, micro-USB 2.0, SDIO, SATA, UART, SPI, I2C, I2S, and GPIO interfaces.
Memory and multimedia features have seen the largest upgrades. You now get 8GB of 128-bit LPDDR4 RAM at 58.3GB/s, or about twice the capacity and bandwidth of the TX1. There’s also 32GB of eMMC 5.1, which is also twice the capacity of the TX1.
The module continues to offer display interfaces including DisplayPort 1.2, eDP 1.4, and HDMI, but the latter has been upgraded to HDMI 2.0. The module now offers video encode at 4K x 2K 60Hz, up from 30Hz. The video decode is the same 4K x 2K 60Hz, but it now supports 12-bit instead of only 10-bit video.
In AI-focused “inference at the edge” applications, these totals can be split up. For example, you could simultaneously decode dual 4K 30Hz streams or do real-time analysis of 4x 30fps HD streams, among other possibilities.
The Jetson TX2 continues to support up to 6x cameras via dual 12-lane MIPI-CSI2 interfaces, and it now offers 2.5Gbps throughput per lane, up from 1.5Gbps. In addition to PCIe Gen 2 x4 and x1 interfaces, you can now opt for a dual x1 with single x2 configuration. Unlike the TX1, the TX2 provides a dual-CAN bus controller for automotive, industrial, or robotics applications. The controller “enables autopilot integration to control robots and drones that use DNNs to perceive the world around them and operate safely in dynamic environments,” says the Nvidia blog.
The TX2 module can operate at -25 to 80°C, and offers 5.5-19.6 VDC power input. In an Nvidia Jetson TX2 pre-brief attended by LinuxGizmos, Deepu Talla, VP and GM of Nvidia’s Tegra unit, said that the Jetson TX2 will support low- and high-power modes.
The low-power Max-Q mode maximizes energy efficiency and runs at less than 7.5 Watts, or twice the efficiency of the TX1. The Max-Q mode will enable larger, deeper neural networks on edge devices with “smarter devices with higher accuracy and faster response times for tasks like image classification, navigation and speech recognition,” says Nvidia.
Nvidia benchmarks showing Jetson TX2 (<15W) beating Xeon-E5-2960 (200W) on inference processing (left) and TX2 AI pipeline architecture
(click images to enlarge)
The high-power Max-P mode runs at less than 15 Watts, and offers twice the performance of the TX1. Talla added that the TX2’s SoC on its own consumes 3.5 Watts. By comprison, the TX1 runs at 10W.
The Nvidia blog post offers far more details on the two modes. It also describes benchmarks showing the Jetson TX2 in Max-P mode running at under 15 Watts beating a 200W system running an Intel Xeon E5-2690 v4 SoC. The test measures deep learning inference throughput (images per second) using the GoogLeNet deep image recognition network.
According to an AnandTech report following Nvidia’s August revelations about the Tegra Parker, the SoC is the same mystery SoC used in Nvidia’s Drive PX 2 platform for self-driving cars that Nvidia announced at CES in Jan. 2016. According to Wikipedia, Tegra Parker is also referred to as the Tegra P1.
Nvidia Tegra Parker block diagram
(click image to enlarge)
The Tegra Parker is a 7-wide superscalar SoC fabricated with a 16nm FinFET process, up from 20nm with the Tegra X1, say AnandTech and other sources. The SoC implements two custom ARMv8 Denver 2 cores, evolved from the previous Denver version of the quad Cortex-A15 Tegra K1 SoC, which offered a 64-bit architecture compared to the mainstream 32-bit version of the K1. The Tegra K1 was used on the first-gen Jetson TK1, which was an SBC rather than a COM.
Nvidia has yet to reveal many details about the Denver 2 cores, aside from the fact that like the SoC’s four Cortex-A57 cores, they each have 2MB of L2 cache. Considering the overall speed improvement claims made for the Jetson TX2, however, the Denver 2 cores are likely closer to Cortex-A72 or Cortex-A73 performance than Cortex-A57.
Denver 2’s biggest improvements come in power efficiency rather than performance, suggests AnandTech. Note that the four Cortex-A53 cores found on the Tegra X1, designed for low-power Big.Little load sharing, are missing here. According to AnandTech, these did not see much use in the X1. Nvidia calls Tegra Parker’s new Big.Little-style full Heterogeneous Multi-Processing (HMP) technology “Big + Super,” with the dual -A57 “Big” cores working with the two “Super” Denver 2 cores.
The AnandTech post also noted that Tegra Parker improves I/O functionality, with a key focus on automotive peripherals like CAN and additional cameras. Parker was also said to double the memory bandwidth to a 128-bit memory bus, bringing aggregate bandwidth to 50GB/sec, which should support LPDDR4-3200. The block diagram show above, which was posted on the Nvidia blog, shows several Cortex-R5 MCUs handling things like power management, as well as a Cortex-A9 APE processor for audio.
In the Jetson TX2 pre-brief, Nvidia’s Talla suggested the big improvements in Jetson TX2 performance originate from the move from Maxwell to Pascal graphics. Pascal, which is common in desktop graphics cards, has the same count of 256 CUDA cores as Maxwell. AnandTech, however, suggests that the switch from Maxwell to Pascal is not as a revolutionary as the previous change from Kepler to Maxwell. Yet, there are major Pascal improvements in areas such as “fine-grained context switching for CUDA applications.”
Jetson TX2 Developer Kit and JetPack 3.0 SDK
The Jetson TX2 is initially available as part of a 170 x 170mm Mini-ITX form factor Jetson TX2 Developer Kit, which appears to be based closely on the TX1 board. The Developer Kit offers coastline GbE, USB 3.0, micro-USB 2.0, and HDMI ports, as well as SATA, M.2 Key E, and PCIe x4 interfaces. There’s a full-sized SD slot, as well as display and MIPI-CSI camera expansion headers.
Jetson TX2 Developer Kit carrier (left) and full kit
(click images to enlarge)
Headers are also provided for GPIO, I2C, I2S, SPI, CAN, and TTL UART with flow control. The board is further equipped with WiFi antennas and an external 19V AC adapter.
Last year, Connect Tech shipped two smaller carrier boards for the TX1: the Orbitty and Elroy, and the Nvidia blog says Connect Tech and Auvidea will each provide smaller carrier boards that support both the TX2 and TX1. Other ecosystem partners include Leopard Imaging and Ridge Run, which will provide cameras and multimedia support. Abaco Systems and Wolf Advanced Technology will deliver MIL-spec systems based on the TX2 for operating in harsh environments.
The Jetson TX2 is defined as an “open platform,” but open hardware support with schematics appear to be are offered only for the carrier board. The Nvidia blog entry points to a variety of developer resources for both the module and the board.
The developer kit ships with an updated, AI-focused JetPack 3.0 SDK built on Linux4Tegra, a custom version of Ubuntu based on Linux 4.4. The SDK provides a TensorRT 1.0 neural network interference engine for production deployment of deep learning applications that the Nvidia blog says played a major role in the TX2’s alleged benchmark victory over the Xeon. It also includes the GPU-accelerated cuDNN 5.1 library of primitives for deep neural networks built around CUDA 8.
JetPack 3.0 SDK architecture
(click image to enlarge)
JetPack 3.0 is further provisioned with the VisionWorks 1.6 SDK for computer vision and image processing. Updated graphics drivers and APIs include OpenGL 4.5, OpenGL ES 3.2, EGL 1.4, and Vulkan 1.0. Nvidia has posted a “Two Days to a Demo” set of deep learning example code on GitHub.
The Nvidia Jetson TX2 Developer Kit, complete with Jetson TX2 module, can be preordered today for $599 in the U.S. and Europe and will begin shipping Mar. 14. It will be available in other regions in the coming weeks. The Jetson TX2 module will be available in Q2 for $399 in quantities of 1,000 or more from Nvidia and its global distributors. The Jetson TX1 Developer Kit is still available, and has been reduced to $499. More information is available at Nvidia’s Jetson TX2 page. Note: The Jetson TX2 Developer Kit is available for $299 for educational institutions.