We recently received an inquiry from an operator who asked whether the new touchscreen customer interfaces available on new vending machines, and as kits for upgrading older ones, can be equipped to recognize spoken questions and commands. He explained that a number of his patrons are visually impaired; they find it easy to use pushbutton selectors, but are challenged by touchscreens that require visual interaction.
We have not heard of speech recognition functionality for vending machine selectors, nor about the possibility of equipping the machine to deliver information to its customers with synthesized speech. This is a subject that we believe is worth exploring, both as a way to improve customer service and as a further move toward increasing the appeal of vending machines to the greatest possible number of customers.
The idea of a talking machine is by no means new, but the implementations we've seen to date have been limited: playback devices that can deliver recorded messages when triggered, or that can speak error codes presented by self-diagnostic circuitry to sight-impaired technicians. The steady progress in computer processing power should offer a straightforward way to enable machines of accommodating users who wish to speak and listen, rather than to touch icons and read.
We've always been intrigued by text-to-speech and speech-to-text software, and we are aware that the second kind has represented much greater difficulty. Text-to-speech systems are commonplace. Consumer devices that can speak foreign-language words and phrases, tell you the time and so forth have been marketed since the early 1980s, and have found their way into automobiles and many other things.
Speech to text is harder. A number of workable systems have been introduced to recognize one of a limited number of choices, prompting the user to speak a number from one to 10 or the name of a primary color, or to say "yes" or "no." Conversational speech with a varied and unpredictable vocabulary is not nearly as easy to convert.
For quite a while now, there have been practical speech recognition programs that enable the user to talk into a microphone and then convert that speech into printable text. The problem is that they have required the user to "train" them after installation, usually by reading a series of provided scripts whose content already is known to the program. This allows it to analyze and record the user's particular pronunciation for future use in translating unscripted speech.
As with much else, the "cloud" offers a new approach. People today increasingly carry powerful networked computers, disguised as cellular telephones, that provide services through real-time communication with distant servers and their databases. "Knowledge navigator" applications like Apple's Siri work by relaying a user's question to a remote server which converts it to text, directs that text to the relevant database, then converts the responses back into speech and relays the voice answer to the phone. A system like this should be able to learn a user's speech patterns and so improve its speech recognition accuracy with repeated use. We have seen the amusing stories of the unpredictable results these "assistants" can produce when used by speakers with regional accents, but these difficulties presumably grow less as the system's knowledge base increases.
We see no reason why this method could not be used by someone standing in front of a vending machine, if the machine is networked through a server that can access a central product database of product information (nutritional content and the like), as well as the operator's own database of machines and the inventories of each. Most of this interaction could be handled by the phone through its own connection to the Web; the only local communication would be the phone's initial "handshake" with the machine, to determine its identity. The phone might then be used to make the payment locally.
But even an offline vending machine could be equipped to speak its menu rather than displaying it on a screen, and to respond to voice questions and commands. This probably is not something that one would want to do everywhere, but it would be a very valuable option in meeting particular location needs.
Much of the conceptual difficulty with envisioning the networked world of the future is caused by imagining that some vast enterprise, like the Manhattan Project or building the transcontinental railroad, has to happen first. In fact, a great many of the requirements already have been met, and more are being addressed all the time. The interesting part is constructing bridges between the "silos" that hold information of one kind and another. We think vending is in a better position to benefit from this than many other retail sectors. A vending machine always has been an unattended point of sale that could not fill a spoken order the way a human sales clerk can. A vast amount of work has gone into working around this limitation. We believe that operators who plan to survive and grow are looking forward to the next step.