All News | Boards | Chips | Devices | Software | Archive | About | Contact | Subscribe
Follow LinuxGizmos:
Twitter Facebook Pinterest RSS feed
*   get email updates   *

3-TOPS per Watt Hailo-8 NPU arrives on M.2 module

Oct 1, 2020 — by Eric Brown 3,229 views

Hailo has launched a line of M.2 and mini-PCIe cards for Linux systems equipped with its up to 26-TOPS, 3-TOPS per Watt Hailo-8 NPU. The Hailo-8 is featured in Foxconn’s BOXiedge v2 AI edge server.

In May we reported on Foxconn’s BOXiedge v2, which runs Linux on Socionext’s 24x Cortex-A53 SynQuacer SC2A11 SoC and a 3-TOPS per Watt Hailo-8 NPU that can run at up to 26 TOPS. Now Hailo has launched an M.2 implementation of the Hailo-8, with a mini-PCIe version on the way. The M.2 M-key 2242 form-factor accelerator is the world’s highest performance AI M.2 module, claims Hailo.

Hailo-8 M.2

The Hailo-8 M.2 AI Acceleration Module provides PCIe Gen3 x4 while the upcoming, mini-PCIe accelerator will offer the same NPU, but with PCIe Gen3 x1. The acceleration cards can run on any Linux-based system, with Windows support in the works. Applications include smart city, retail, home, and industrial applications — especially those in which multiple cameras and sensors need to be processed and analyzed at once. Since our last report, Hailo has launched a PCIe form-factor Evaluation Board for the Hailo-8.


Hailo claims its 17 x 17mm Hailo-8 chip vastly outperforms Google’s Edge TPU and Intel’s Movidius Myriad X on a TOPS per watt basis running AI semantic segmentation and object detection applications including ResNet-50. The company has posted some new benchmarks that show its Hailo-8 M/2 module achieving 26x higher frames per second AI performance than Myriad-X and 13x higher than Edge TPU, each of which can achieve a maximum of 4 TOPS.

The Hailo-8 uses a “proprietary novel structure-driven” Dataflow architecture that differs from the Van Neumann architecture used on most neural processors. The architecture achieves low-power memory access by implementing a distributed memory fabric combined with purpose-made pipeline elements.

Hailo-8 Evaluation Board
(click image to enlarge)

Hailo-8’s dataflow-oriented interconnect adapts according to the structure of the neural network to enable high resource utilization, says Hailo. The Hailo-8 hardware is tightly integrated with an SDK that offers scalable toolchain including model translation from industry standard frameworks like ONNX and TensorFlow.

Hailo-8 infographic (left) and benchmarks
(click images to enlarge)

Foxconn was one of Hailo’s first publicly disclosed customers after NEC and ABB Technology, which led the Tel Aviv company’s $88 million in funding. Foxconn has yet to fully reveal the Linux-powered Boxiedge v2. The system, which is also called the BEX-1000, is equipped with a BEMB-1000 Mini-ITX board.

Boxiedge v2 with BEMB-1000 mainboard
(click image to enlarge)

The Boxiedge v2’s BEMB-1000 motherboard is loaded with Socionext’s SynQuacer SC2A11 SoC and the Hailo-8 M.2 AI Acceleration Module. The system is design to perform real-time image classification, detection, pose estimation, and other tasks on footage from up to 20 cameras while running at 35W.

Further information

The Hailo-8 M.2 AI Acceleration Module is available now at an undisclosed price. More information may be found in Hailo’s announcement on AP and the product page.

(advertise here)

Print Friendly, PDF & Email

One response to “3-TOPS per Watt Hailo-8 NPU arrives on M.2 module”

  1. Sean says:

    Hmm, even at 26 TOPs which is very fast. Google’s Edge TPU at 4 TOPs would make this board at most about 6.5 times faster. I guess it depends on if its 8 bit, 16 bit, or 32 bit.

Please comment here...