Cloud-based speech tech humanizes humanoid robot
Nov 1, 2013 — by Rick Lehrbaum 2,512 viewsAldebaran Robotics and Nuance Communications are teaming up to help Aldebaran’s pint-sized Nao humanoid robot interact better with its human companions — in 19 different languages. Nao, which runs a customized version of Gentoo Linux on its 1.6GHz Intel Atom Z530 brain, is touted as a programmable, interactive humanoid robot featuring motion, vision, tactile, and audio capabilities. All that, and it’s cute, too!
Nao is “a fully-programmable and interactive humanoid robot equipped with state-of-the-art motion, vision, tactile, and audio capabilities,” says Aldebaran Robotics. Nao can “walk on different surfaces, track and recognize faces and objects, express and understand emotions, and react to touch and interact by voice.”


Aldebaran Nao robot
(click images to enlarge)
Although Nao has incorporated embedded voice recognition technology from Nuance for several years, the 573mm (22.5-inch) tall robot will soon begin tapping into Nuance’s cloud-based voice recognition and “expressive” text-to-speech technologies. The companies say the new speech I/O tech will “allow people to have truly natural conversations with the robot, in nineteen different languages.” Currently, Nao is limited to conversing in nine languages — about nine times more than most humans.
— ADVERTISEMENT —
Nao’s voice will be customized to match its unique personality, says Aldeberan. This will help it “interact and engage in various settings, including education and special education environments with autistic children, and personal robotics,” adds the company.

Nao’s physical features, labeled
(click image to enlarge)
Here are a few of Nao’s interesting capabilities, as described on its data sheet:
- Camera — improved camera sensors provide higher sensitivity in VGA for better low light perception; image processing on the robot CPU can use up to 30 images/sec in HD resolution; Nao’s head can move by 239° horizontally and 68° vertically, and its camera can see at 61° horizontally and 47° vertically.
- Object recognition — can recognize a large quantity of objects; once an object is saved, Nao can recognize it and say what it is when it sees it again.
- Face detection and recognition — can detect and learn a face in order to recognize it.
- Text-to-speech (TTS) — speaks up to 9 languages (19 after the new Nuance speech technology upgrade); with a “say box” in Choregraphe, you can insert text and modify voice parameters as you wish. NAO will say the text correctly, with the right punctuation and intonation.
- Automatic speech recognition — can hear speech from 2m away; can recognize phrases or complete sentences
- Sound detection and localization — able to detect and localize in the space thanks to microphones all around his head.
- Smart stiffness — automatically adapts the power needed by the motors during the movements of the robot, for better use of drive components and battery energy savings
- Fall manager — Nao may fall, but he’s capable of standing back up by himself; additionally, a fall detection system lets Nao protect himself with his arms.
- Anti self-collision — prevents Nao’s arms from colliding with the rest of his body; Nao always knows the position of his head, torso, legs, and arms, to avoid accidental limb collisions.
- Resource manager — Nao’s biggest challenge is to merge and order conflicting commands; he’s able to interrupt/stop or adjust the behavior in progress before executing a new required behavior.
Nao specifications
Specifications listed by Alderaban for the Nao Robot include:
- Body with 25 degrees of freedom (DOF) whose key elements are electric motors and actuators
- Primary control computer:
- Processor — 1.6GHz Atom processor (located in the head)
- 1GB RAM; 2GB flash; 8GB microSDHC flash
- Connectivity — 1x gigabit Ethernet; WiFi 802.11b/g/n
- Software — Gentoo-derived Linux OS with Aldebaran’s proprietary middleware (NAOqi)
- Secondary CPU (located in the torso)
- Sensor network:
- 2x cameras
- 4x microphones
- sonar rangefinder
- 2x IR emitters and receivers
- 1x inertial board
- 9x tactile sensors
- 8x pressure sensors
- Communication devices:
- Voice synthesizer
- LEDs
- 2x high-fi speakers
- Weight — 5.2kg (11.4 lb)
- Dimensions — 573 × 275 × 311mm (22.5 x 10.8 x 12.2 inches)
- Power pack — 27.6 Wwatt-hour Lithium-Ion battery; provides 1.5+ or hours of autonomy (depending on usage)
In the YouTube video below, Bruno Maisonnier, founder and CEO of Aldebaran Robotics, and Steve Chambers, a vice president at Nuance, demonstrate Nao and discuss the future of humanoid robotics.
Nao demonstration video
“Our vision is to create even more intuitive and human-like interactions between man and machine as part of the Nao experience, in turn creating a wealth of new application opportunities for Nao and the next generation of robotic companions,” says Aldebaran Robotics founder and CEO Bruno Maisonnier.
Nao robots featuring Nuance’s Natural Language Understanding and text-to-speech are expected to begin shipping in early 2014. Further details on the Nao robot are available at Aldeberan’s website. For more information on Nuance’s speech technologies, visit the NDEV Mobile SDK web page.
Please comment here...