3-TOPS per Watt Hailo-8 NPU arrives on M.2 module
Oct 1, 2020 — by Eric Brown — 1306 viewsHailo has launched a line of M.2 and mini-PCIe cards for Linux systems equipped with its up to 26-TOPS, 3-TOPS per Watt Hailo-8 NPU. The Hailo-8 is featured in Foxconn’s BOXiedge v2 AI edge server.
In May we reported on Foxconn’s BOXiedge v2, which runs Linux on Socionext’s 24x Cortex-A53 SynQuacer SC2A11 SoC and a 3-TOPS per Watt Hailo-8 NPU that can run at up to 26 TOPS. Now Hailo has launched an M.2 implementation of the Hailo-8, with a mini-PCIe version on the way. The M.2 M-key 2242 form-factor accelerator is the world’s highest performance AI M.2 module, claims Hailo.
![]() Hailo-8 M.2 |
The Hailo-8 M.2 AI Acceleration Module provides PCIe Gen3 x4 while the upcoming, mini-PCIe accelerator will offer the same NPU, but with PCIe Gen3 x1. The acceleration cards can run on any Linux-based system, with Windows support in the works. Applications include smart city, retail, home, and industrial applications — especially those in which multiple cameras and sensors need to be processed and analyzed at once. Since our last report, Hailo has launched a PCIe form-factor Evaluation Board for the Hailo-8.
Hailo claims its 17 x 17mm Hailo-8 chip vastly outperforms Google’s Edge TPU and Intel’s Movidius Myriad X on a TOPS per watt basis running AI semantic segmentation and object detection applications including ResNet-50. The company has posted some new benchmarks that show its Hailo-8 M/2 module achieving 26x higher frames per second AI performance than Myriad-X and 13x higher than Edge TPU, each of which can achieve a maximum of 4 TOPS.
— ADVERTISEMENT —

The Hailo-8 uses a “proprietary novel structure-driven” Dataflow architecture that differs from the Van Neumann architecture used on most neural processors. The architecture achieves low-power memory access by implementing a distributed memory fabric combined with purpose-made pipeline elements.

Hailo-8 Evaluation Board
(click image to enlarge)
Hailo-8’s dataflow-oriented interconnect adapts according to the structure of the neural network to enable high resource utilization, says Hailo. The Hailo-8 hardware is tightly integrated with an SDK that offers scalable toolchain including model translation from industry standard frameworks like ONNX and TensorFlow.


Hailo-8 infographic (left) and benchmarks
(click images to enlarge)
Foxconn was one of Hailo’s first publicly disclosed customers after NEC and ABB Technology, which led the Tel Aviv company’s $88 million in funding. Foxconn has yet to fully reveal the Linux-powered Boxiedge v2. The system, which is also called the BEX-1000, is equipped with a BEMB-1000 Mini-ITX board.

Boxiedge v2 with BEMB-1000 mainboard
(click image to enlarge)
The Boxiedge v2’s BEMB-1000 motherboard is loaded with Socionext’s SynQuacer SC2A11 SoC and the Hailo-8 M.2 AI Acceleration Module. The system is design to perform real-time image classification, detection, pose estimation, and other tasks on footage from up to 20 cameras while running at 35W.
Further information
The Hailo-8 M.2 AI Acceleration Module is available now at an undisclosed price. More information may be found in Hailo’s announcement on AP and the product page.
Please comment here...