Fujitsu Laboratories has developed speech interface technology that enables users to retrieve a variety of information by simply speaking into a smartphone, without having to look at the smartphone's display.
After listening to a synthesized speech read the latest news and other information, users can articulate the information that they would like to learn more about. The software will then read details about the topic and other related information. By taking advantage of this technology, users who are driving or working and need to keep their eyes and hands free can use various information services without having to look at or touch the smartphone's display.
Background
Currently, most smartphones and other mobile devices are operated by the user touching the handset while looking at its display. However, mobile devices are also employed in other situations - such as walking, driving, and working - where users must keep their eyes and hands focused on the task at hand. In such scenarios, users can benefit from speech recognition technology that understands human speech and speech synthesis technology where devices are able to read text aloud.
In recent years, by employing devices to remotely access data centers, where an abundance of computing resources can be utilized, it has become possible to develop speech recognition and synthesis technologies that handle a larger lexicon than has previously been possible on stand-alone devices. This has led to high expectations for the delivery of new and innovative services.
Fujitsu Laboratories has developed industry-leading technologies that include, professional-level quality speech synthesis technology, as well as speech recognition technology that can eliminate background noise while picking up on only the user's voice. The company is currently aiming to enable new speech interfaces, including the development of data center-based speech recognition and synthesis technologies.
Technological Issues
Speech-based input and output makes it easy to receive various news and other information services without looking at or touching a handset device. To accomplish this, and in order for the system to accurately pronounce news and other content and correctly recognize words articulated by the user, it must be able to properly support the ever-growing assortment of new terminology, including modern lingo. In addition, the system must also be able to properly interpret homonym variants spoken by the user. Fujitsu Laboratories is working to overcome these challenges and to realize a highly original function for smooth and ideal communication.
About the Newly Developed Technology
To address these issues, Fujitsu Laboratories has developed a new eyes-free and hands-free speech interface in which, by simply speaking about what the user is interested in, the system pulls up relevant information and reads it out loud. For instance, when the user speaks a particular phrase from a news headline that the system has read, the system will read more detailed articles related to the topic at hand.
Features of the newly developed technology are:
- Speech dialogue knowledge building technology supports the latest modern lingo and newly coined terms
Language is constantly changing. To address linguistic evolution, Fujitsu has developed technology that automatically extracts the orthographic patterns of new terminology from text found on the Internet, and then automatically inputs it into the system's vocabulary dictionary. This makes it possible to create a speech interface that minimizes often misread and falsely recognized words.
- Technology that selects from homonym variants based on previous exchanges
Fujitsu has developed technology that analyzes information previously presented by the system, extracts vocabulary focused on certain topics, and automatically generates a speech recognition dictionary. As a result, the system is able to correctly recognize homonyms and other ambiguous phrases, thereby helping to facilitate accurate dialogue with the user.
- Technology for providing appropriate responses
When performing speech recognition and speech synthesis, the handset is connected to a data center where a huge lexicon is stored and updated. Fujitsu Laboratories has developed technology that, by dividing and anticipating speech data, is able to absorb the delays caused by processing and transmission as part of the data center-based speech recognition and speech synthesis process. In addition, the technology is able to further improve the quality of the response time by controlling the timing of breaks between words. As a result, the user experience compares favorably with that of car navigation systems.
Results
This technology enables users to retrieve information through a series of intuitive speech interaction, without looking at any displays. As a result, news, email and other web services frequently used in daily life are available while driving or walking, or provided to users who have difficulty viewing a display. In addition, for audio tour systems employed in museums, the technology can provide more detailed information. For example, additional information could be offered just by saying a word that comes up in an audio tour or in a description of an exhibit.
© JCN Newswire
3 Comments
Login to comment
TakahiroDomingo
sounds really interesting, and i'd love to test it without having to pay for it. i wonder what it will feel like: talking to a dumb person or to an intelligent machine?
societymike
this is what has already been built into Android phones for over a year......
TakahiroDomingo
@socmike: please document your assertion "already been built into Android phones for over a year". i would like to have an app on my android tablet that reads to me, and when i talk back to it, it finds more info, and interacts in this way. and that i don't have to use my hands, touching the screen to tell it what to do.
maybe you didn't read the news thoroughly...