3-TOPS per Watt Hailo-8 NPU arrives on M.2 module
Oct 1, 2020 — by Eric Brown 2,752 viewsHailo has launched a line of M.2 and mini-PCIe cards for Linux systems equipped with its up to 26-TOPS, 3-TOPS per Watt Hailo-8 NPU. The Hailo-8 is featured in Foxconn’s BOXiedge v2 AI edge server.
In May we reported on Foxconn’s BOXiedge v2, which runs Linux on Socionext’s 24x Cortex-A53 SynQuacer SC2A11 SoC and a 3-TOPS per Watt Hailo-8 NPU that can run at up to 26 TOPS. Now Hailo has launched an M.2 implementation of the Hailo-8, with a mini-PCIe version on the way. The M.2 M-key 2242 form-factor accelerator is the world’s highest performance AI M.2 module, claims Hailo.
![]() Hailo-8 M.2 |
The Hailo-8 M.2 AI Acceleration Module provides PCIe Gen3 x4 while the upcoming, mini-PCIe accelerator will offer the same NPU, but with PCIe Gen3 x1. The acceleration cards can run on any Linux-based system, with Windows support in the works. Applications include smart city, retail, home, and industrial applications — especially those in which multiple cameras and sensors need to be processed and analyzed at once. Since our last report, Hailo has launched a PCIe form-factor Evaluation Board for the Hailo-8.
— ADVERTISEMENT —
Hailo claims its 17 x 17mm Hailo-8 chip vastly outperforms Google’s Edge TPU and Intel’s Movidius Myriad X on a TOPS per watt basis running AI semantic segmentation and object detection applications including ResNet-50. The company has posted some new benchmarks that show its Hailo-8 M/2 module achieving 26x higher frames per second AI performance than Myriad-X and 13x higher than Edge TPU, each of which can achieve a maximum of 4 TOPS.
The Hailo-8 uses a “proprietary novel structure-driven” Dataflow architecture that differs from the Van Neumann architecture used on most neural processors. The architecture achieves low-power memory access by implementing a distributed memory fabric combined with purpose-made pipeline elements.

Hailo-8 Evaluation Board
(click image to enlarge)
Hailo-8’s dataflow-oriented interconnect adapts according to the structure of the neural network to enable high resource utilization, says Hailo. The Hailo-8 hardware is tightly integrated with an SDK that offers scalable toolchain including model translation from industry standard frameworks like ONNX and TensorFlow.


Hailo-8 infographic (left) and benchmarks
(click images to enlarge)
Foxconn was one of Hailo’s first publicly disclosed customers after NEC and ABB Technology, which led the Tel Aviv company’s $88 million in funding. Foxconn has yet to fully reveal the Linux-powered Boxiedge v2. The system, which is also called the BEX-1000, is equipped with a BEMB-1000 Mini-ITX board.

Boxiedge v2 with BEMB-1000 mainboard
(click image to enlarge)
The Boxiedge v2’s BEMB-1000 motherboard is loaded with Socionext’s SynQuacer SC2A11 SoC and the Hailo-8 M.2 AI Acceleration Module. The system is design to perform real-time image classification, detection, pose estimation, and other tasks on footage from up to 20 cameras while running at 35W.
Further information
The Hailo-8 M.2 AI Acceleration Module is available now at an undisclosed price. More information may be found in Hailo’s announcement on AP and the product page.
Hmm, even at 26 TOPs which is very fast. Google’s Edge TPU at 4 TOPs would make this board at most about 6.5 times faster. I guess it depends on if its 8 bit, 16 bit, or 32 bit.