It's been six years since Apple added Siri to iOS and voice became a key way users could interact with technology. In the years since, Google Assistant and Alexa have also established strong footholds as central players in the way we manage myriad services and smart-home devices — and Cortana has become an integral part of the PC. As much as voice-activated virtual assistants have begun to transform the consumer landscape on devices from smartphones and smartwatches to smart speakers, they have yet to truly break through in the enterprise.
That isn't to say we don't already accomplish business tasks using these virtual assistants. Many of us rely on them for quick tasks such as getting directions, scheduling appointments and responding to messages. But these haven't changed much since 2011. Granted, we now issue instructions to a wider variety of devices — particularly smartwatches — but voice has yet to become a mainstay in the workplace.
That said, if voice is going to reach its true potential, we need to forget the quick shortcuts.
In the office, it's all about collaboration.
Voice isn't for every situation
Before voice can supercharge collaboration, look at where it can realistically work well now, because some basic assumptions need to be discarded. Chief among them: the expectation that voice will soon be everywhere in an office or building.
The phrases "smart office" or "voice-enabled workplace" conjure images of dozens of people simultaneously accessing systems by voice in every part of a building, like characters in Star Trek talking to the Enterprise's computer. This isn't realistic. For one thing, acoustic realities make this a large challenge technically (and financially) as well as a challenge for users trying to carry on conversations with a virtual assistant and co-workers or both. For another, spaceship crews are often shown using keyboards, touchscreens and other interfaces — voice is not the only option for them, nor is it necessarily the best one. (While Google Home and Amazon Echo devices can recognize individuals by voice, this capability is extremely new and limited to a handful of individuals.)
In other words, it isn't ready to scale to large enterprises in the near future.
All of this means that voice will largely be used with personal devices, much as we're already doing, or it will be most likely to be implemented in private offices, cubicles and meeting/conference rooms. This provides flexibility and simplifies deployment, since current voice-enabled PCs, mobile devices and wearables can already manage many interactions. Relatively low-cost devices can fill in the gaps in offices or collaborative spaces (think Google Home Mini or EchoDot).
It's all about apps and APIs
The key question about voice isn't where we'll access it so much as how we'll use it; the answer lies in enterprise apps and the APIs that connect them.
Consider the following example task:
"Send John the budget files for the service department for last month so he can update the charts on slides 3 and 4 of the PowerPoint for next week's department meeting."
Although it sounds like a relatively simple request, it breaks down into multiple commands: starting a new message to the right person; locating the related files and including them (or a link to them); adding instructions for the data to be updated; flagging individual slides to highlight what needs to be updated; and attaching an invitation for the relevant meeting and/or deadline.
Depending on the workplace tools in use, this could easily span four different apps and rely on local, network and cloud resources. All of these apps and resources need to be able to interact, though not everything necessarily needs to be voice-enabled for the task to be done.
Most voice interactions break down into similar on-the-fly workflows that can be relatively simple or extremely complex.
If you've configured smart-home actions using Apple's HomeKit, you're likely familiar with the concept of "scenes" that allow you to create similar workflows stretching across several devices — "I'm going out" means turning off all the lights, setting the thermostat and locking the doors, for example. Configuring workflows using IFTTT (If This, Then That) is a good example of building a set of commands to deliver complex actions via simple triggers across devices, apps and services. The challenge (and potential) for voice-activated assistants in the enterprise is creating a framework that constructs such workflows on the fly.
The glue to make all this work will be voice: the platform of choice, the apps needed for success, and the APIs that allow the exchange of commands, information and results. Alexa's "skills" demonstrate what this looks like at home; capabilities are added like apps and simply become part of the repertoire for both the device/home environment as well as for users of that particular device. Think of skills as building blocks.
For workplaces, this will mean developing apps that can work with a selected voice ecosystem and interact with each other when needed and making them available in an enterprise app store.
A big challenge for IT will be to devise a strategy that accounts for multiple platforms and extends across multiple boundaries — device type, OS, the cloud, app providers, individual apps and indivdual workstyles. It may also need to cross language and dialect boundaries. Simply put, using voice for work tasks will need to be well thought out and actively tested.
One voice or many?
Another critical issue: Will a single voice- or virtual-assistant platform will suffice, or will a company need to support a mixture — say Siri, Cortana and Google Assistant?
There are both challenges and opportunities. With the exception of Siri, voice platforms extend to non-native devices and OSes. But while Cortana and Google Assistant exist on Windows, Android and iOS, their capabilities and OS/app/data/service integration vary. Similar nuances in ability also vary within a native platform by OS release. It's something that can affect Google, in particular, because of Android's innate fragmentation.
If multiple voice and assistant platforms are actively supported, IT shops will be challenged to not only define and implement a strategy but also to describe and document it for users. Indeed, this may ultimately be the deciding factor in whether voice and its future collaborative potential succeed. As with an enterprise app strategy, user adoption will be a critical metric for success.
Is voice ready for enterprise prime time?
Without a doubt, voice is already in the enterprise, powering productivity and collaboration efforts. But even after more than half a decade on the market, voice remains in its infancy. The revolution it offers is coming, but the needed confluence of hardware, platform, apps and workflows isn't in place yet to capture its potential.
But it is on the way, and now is the time for IT departments and business app developers to take notice and prepare.