OK Google, what’s the future of voice recognition tech?


Throughout human history, the spoken word has always been at the heart of communication. Then, in the mid 1990s, a new world blinked and flickered into life. A world where computers became mainstream and where the written word and all of its myriad forms (SMS, emails, Instant Messengers) took over, and sent the spoken word into an exile that would last more than 20 years. 

This was the dawn of a digital communication where colleagues would ‘IM’ each other from within the same office, couples would message each other at the same party; families from the same house.

But in recent years, fuelled by our desire to re-connect with our most natural form of communication, the spoken word has started to stir and is returning into the light through our day-to-day technology. Today, we can control music, switch off the lights, get weather updates, initiate phone calls, and even order new toilet roll without lifting a finger. In short, we have begun to find our voice again.

It might seem like innovation, but voice recognition technology is not a new thing. Whether you’ve tried to speak to a Customer Service team through an automated helpline, or your name is Gandalf and you're trying to get into the mines of Moria, chances are you’ve come across it already.

The technical challenge

The first challenge for voice search came through recognising the words we’ve spoken with the highest level of accuracy. That’s the first mountain that new voice technology had to climb, and it was central to overcoming our heightened expectations and lack of patience for new technology.

The next step is even harder; moving beyond recognition and into accurate interpretation. This involves taking into account meaning, accents, misspoken phrases and still providing a relevant response. Achieving this frees us to be more natural with our language and use a more instinctive conversational style rather than having to stick to rigid phrases.

The greatest challenge 

Yet the greatest challenge to the future of voice recognition tech is not a technical one, rather a behavioural one. Although we instinctively use our voices in almost every other area of our lives, when it comes to tech we are hardwired to click, tap and type.

Our willingness to try something new largely comes down to our perception of the level of difficulty and risk involved. If we perceive that it would be easier or more reliable to type a request than to speak it, then it will take a great deal of curiosity to persuade us to use our voices. Then there’s the risk of looking silly…

Getting a person to repeat a new behaviour requires a level of trust that they have developed, leading them to believe that they will experience a desired outcome. Trust in the context of new technology is often hard to build and easily broken. A shortcut to this, is inspiring delight with the new tech. Delight, in the context of new technology, is the experience of exceeded expectations which improves our perception and engenders trust. The greater the delight, the greater the chance we will try it again. 

Although flawless voice recognition technology may not be far off, a behavioural shift of the majority can only occur after several rounds of positive experiences are combined. Voice recognition and the artificial intelligence behind it is only going to become more sophisticated, and it seem that our voices that will lead the way; it will be a revolution and it will change everything.