All News | Boards | Chips | Devices | Software | LinuxDevices.com Archive | About | Contact | Subscribe
Follow LinuxGizmos:
Twitter Google+ Facebook RSS feed
*   get email updates   *

96-core NanoPi Fire3 cluster computer blows past RPi rigs in benchmarks

Jul 12, 2018 — by Eric Brown — 4807 views

Cluster computer projects are increasingly looking beyond the Raspberry Pi to build devices with faster cluster-friendly SBCs. Here’s a 96-core monster that taps the octa-core NanoPi Fire3.

Cluster computers constructed of Raspberry Pi SBCs have been around for years, ranging from supercomputer-like behemoths to simple hobbyist rigs. More recently, we’ve seen cluster designs that use other open-spec hacker boards, many of which offer higher computer power and faster networking at the same or lower price. Farther below, we’ll examine one recent open source design from Paul Smith at Climbers.net that combines 12 octa-core NanoPi-Fire3 SBCs for a 96-core cluster.


Beast v2

SBC-based clusters primarily fill the needs of computer researchers who find it too expensive to book time on a server-based HPC (high performance computing) cluster. Large-scale HPC clusters are in such high demand, that it’s hard to find available cluster time in the first place.

Research centers and universities around the world have developed RPi-based cluster computing for research into parallel computing, deep learning, medical research, weather simulations, cryptocurrency mining, software-defined networks, distributed storage, and more. Clusters have been deployed to provide a high degree of redundancy or to simulate massive IoT networks, such as with Resin.io’s 144-pi Beast v2.

Even the largest of these clusters comes nowhere close to the performance of server-based HPC clusters. Yet, in many research scenarios, top performance is not essential. It’s the combination of separate cores running in parallel that make the difference. The Raspberry Pi based systems typically use the MPI (Messaging Passing Interface) library for exchanging messages between computers to deploy a parallel program across distributed memory.

BitScope, which is a leader in Pi cluster hardware such as its Bitscope Blade, has developed a system with Los Alamos National Laboratory based on its larger BitScope Cluster Module. The Los Alamos hosted system comprises five racks of 150 Raspberry Pi 3 SBCs. Multiply those 750 boards by the four Cortex-A53 cores on each Pi and you get a 3,000-core parallelized supercomputer.



Quattro Pi version of Bitscope Blade (left) and BitScope Cluster Module
(click images to enlarge)

The Los Alamos system is said to be far more affordable and power efficient than building a dedicated testbed of the same size using conventional technology, which would cost a quarter billion dollars and use 25 megawatts of electricity. There are now plans to move to a 4,000-core Pi cluster.

Most clusters are much smaller 5-25 board rigs, and are typically deployed by educators, hobbyists, embedded engineers, and even artists and musicians. These range from open source DIY designs to commercial hardware rack systems designed to power and cool multiple densely packed compute boards.

 
96-core NanoPi Fire3 cluster shows impressive benchmarks

The 96-core cluster computer recently detailed on Climbers.net is the largest of several cluster designs developed by Nick Smith. These include a 40-core system based on the octa-core NanoPC-T3, and others that use the Pine A64+, the Orange Pi Plus 2E, and various Raspberry Pi models. (All these SBCs can be found in our recently concluded hacker board reader survey.)


NanoPi Fire3

The new cluster, which was spotted by Workonarm and further described on CNXSoft, uses FriendlyElec’s open-spec NanoPi Fire3.

The open source cluster design includes Inkscape code for laser cutter construction. Smith made numerous changes to his earlier clusters intended to increase heat dissipation, improve durability, and reduce space, cost, and power consumption. These include offering two 7W case fans instead of one and moving to a GbE switch. The Bill of Materials ran to just over £543 ($717), with the NanoPi Fire3 boards totaling £383, including shipping. The next biggest shopping item was £62 for microSD cards.



96-core NanoPi Fire3 based cluster computer from two angles
(click images to enlarge)

The $35 Fire3 SBC, which measures only 75x40mm, houses a powerful Samsung S5P6818. The SoC features 8x Cortex-A53 cores at up to 1.4GHz and a Mali-400 MP4 GPU, which runs a bit faster than the Raspberry Pi’s VideoCore IV.
Although the Fire3 has only twice the number of -A53 cores as the Raspberry Pi 3, and is clocked only slightly faster, Smith’s benchmarks showed a surprising 6.6x times faster CPU boost over a similar RPi 3 cluster. GPU performance was 7.5x faster.


CPU benchmark comparison between 96-core Fire3 cluster and other computers
(click image to enlarge)

It turned out that much of the performance improvement was due to the Fire3’s native, PCIe-based Gigabit Ethernet port, which enabled the clustered SBCs to communicate more quickly with one another to run parallel computing applications. By comparison, the Raspberry Pi 3 has a 10/100Mbps port.


Raspberry Pi 3
Model B+

Performance would no doubt improve if Smith had used the new Raspberry Pi 3 Model B+, which offers a Gigabit Ethernet port. However, since the B+ port is based on USB 2.0, its Ethernet throughput is only three times faster than the Model B’s 10/100 port instead of about 10 times faster for the Fire3.

Still, that’s a significant throughput boost, and combined with the faster 1.4GHz clock rate, the RPi 3 B+ should quickly replace the RPi 3 Model B in Pi-based cluster designs. BitScope recently posted an enthusiastic review of the B+. In addition to the performance improvements, the review cites the improved heat dissipation from the PCB design and the “flip chip on silicon” BGA package for the Broadcom SoC, which uses heat spreading metal. The upcoming Power-over-Ethernet capability should also open new possibilities for clusters, says the review.



Odroid-MC1 Solo (left) and original Odroid-MC1
(click images to enlarge)

Hacker board community sites are increasingly showcasing cluster designs — here’s a cluster case design for the Orange Pi One on Thingiverse — and some vendors offer cluster hardware of their own. Hardkernel’s Odroid project, for example, came out with a 4-board, 32-core Odroid-MC1 cluster computer based on an Odroid-XU4S SBC, a modified version of the Odroid-XU4, which won third place in our hacker board survey. The board uses the same octa-core -A15 Samsung Exynos5422 SoC. More recently, it released an Odroid-MC1 Solo version that lets you choose precisely how many boards you want to add.

The Odroid-MC1 products are primarily designed to run Docker Swarm. Many of the cluster systems are designed to run Docker or other cloud-based software. Last year Alex Ellis, for example, posted a tutorial on creating a Serverless Raspberry Pi cluster that runs Docker and the OpenFaaS framework. Indeed, as with edge computing devices running modified versions of cloud software, such as AWS Greengrass, cluster computers based on SBCs show another example of how the embedded and enterprise server worlds are interacting in interesting new ways using Linux.

This article is copyright © 2018 Linux.com and was originally published here. It has been reproduced by this site with the permission of its owner. Please visit Linux.com for up-to-date news and articles about Linux and open source.
 

(advertise here)


Print Friendly, PDF & Email
PLEASE COMMENT BELOW

Please comment here...