There’s an ongoing discussion about the potential of voice technology and whether it’s really as revolutionary as we thought it to be. The answer is yes—it’s a matter of exploring its unique potential, not just copying existing technologies.
Tobias Dengel, a technology leader and author of a book about voice technology, joins us today in the new episode of the Scaling Tech podcast. Tobias is the President at WillowTree, a leading mobile strategy, design, and app development company servicing Fortune 500/5000 clients and large government agencies. He’s also the author of The Sound of the Future: The Coming Age of Voice Technology, which illustrates how voice technology is set to transform business operations across industries.
In this episode, we discuss why voice technology works best with a multimodal approach, how voice commands can transform user interactions, and the complexities of building trust with AI and voice models.
Tune in to discover how voice interfaces combined with AI can bring significant value to your users and your organization. And keep following the Scaling Tech podcast where your host, Arin Sime, brings industry experts to help your growing software engineering team stay ahead in the rapidly evolving tech space.
Listen on Spotify
Listen on Apple Podcasts
Watch the video:
Key Insights with links to jump ahead are below
About Guest
Name: Tobias Dengel
What he does: He’s the President at WillowTree, a leading mobile strategy, design, and app development company servicing Fortune 500/5000 clients and large government agencies.
Company: WillowTree
Noteworthy: He’s a seasoned technology executive with 20+ years of experience in mobility, digital media, and interactive marketing.
Where to find Tobias: / LinkedIn
Key Insights
⚡We’re not using voice technology to its full potential. Attempting to emulate existing technologies only limits the transformative potential of voice. Tobias says, “The question you ask is, ‘Well, great, why hasn’t it changed the world yet?’ The answer is we’re not using it right. And as an industry, we have made the classic mistake that we all make with new technologies is that we try to simulate or emulate a technology that already exists or something that already exists. […] We’ve tried to simulate human conversation, which is a natural thing to do. You say, ‘Okay, I got voice. Let’s simulate human conversation.’ But that’s actually not the best use.”
⚡Voice technology works best with a multimodal approach. Voice technology is most effective when combined with other interfaces. By working together, voice and software teams can create the fastest and most efficient interactions. This approach improves real-time communication and overall efficiency. Tobias explains the multimodal mindset, “I think the most interesting thing that’s going on from an efficiency standpoint, when you think about this multimodal, is that when you are saying something to device, the transcription is happening, and you’re showing the user this transcription and in the background, the app is like, if you’re doing a fast food order, building the order, you’re going from call and response, you’re going to concurrent communication for really the first time ever is that you’re communicating and the machine’s reacting multimodally in a real way. That’s why this multimodal mindset is so important.”
⚡Voice technology is becoming more popular because of cost savings and better user experience. One of the greatest perks of voice technology is its cost efficiency coupled with improved user experience. Tobias says, “We just did a study for the bank, that we could reduce the cost of their calls by about 92% by adopting some of these technologies versus what they were having to do now, asking people for certain applications for certain reasons to call 800 numbers. So the voice adoption is not just going to be driven by the fact that it’s a better user interface; it’s going to be much less expensive. And it’s rare, by the way, that you get a technology that both lowers costs and gives users a better experience.”
Episode Highlights
Is voice technology the next mobile revolution?
Voice has come a long way since its initial release. Despite some initial skepticism, it’s clear now that voice is a major trend. Tobias believes that it is set to become the primary way we interact with devices. He explains,
“Voice really got us interested because, up until ChatGPT, it was the fastest adoption of any new technology that we’d seen. But it hadn’t really changed the world the way mobile had or the internet had. So we were thinking, is this is this really a big trend? Is voice a big deal, or is it not? And I think that’s really the subject of the book. If you really break down what voice is good at and what it can do and what it’s not good at, and you start optimizing the experience to those things, you end up in a conclusion that holy crap, this is a really, really big deal. And the primary interface with which we are going to be interacting with devices, and especially inputting into devices over the coming few years in very short order, is going to be voice, and it will have the same types of impact as mobile did or the internet did.”
How can voice commands transform user experiences?
The best use of voice is giving voice commands and getting text or visual responses, not having a full conversation. Tobias shares an interesting example,
“The example I always use is movie tickets. We all want to ask Alexa or Siri what movies are playing tonight. We do not want to listen to five movies with three showtimes each. We can’t remember that. It’s super slow. It’s Moviephone from 20 years ago, 30 years ago, whenever we had Moviephone. But what we want is the app just to show us the movies, and then we say, ‘All right, Star Wars at 8, get me two tickets,’ and voice authenticated. Boom. And now you’ve taken an experience that in an app today, which is already really efficient, still takes 3 or 4 minutes, and you’ve taken that to 20 to 30 seconds. Everything we know about human behavior and technologies, if you take something from three minutes to 30 seconds, it’s a game changer. And the companies that adopt this mindset first are going to have a massive competitive advantage.”
Can voice technology build long-term trust?
Building trust looks different for AI and voice models. It’s more than just replicating human-like experiences; it’s actually more complex than that. Tobias explains,
“When you think about human trust, what causes trust, there’s affective or emotional trust, like do you like a person, or do you like a thing? And that, in theory, gets formed within seconds of meeting someone, etc. And then there’s cognitive trust or rational trust, which is built over a long period of time, which basically is, does this person or thing do what they say they’re going to do? So, voice kind of failed on both. It failed on the emotional trust because of a theory called the Uncanny Valley, which was developed in Japan in the 1970s, which stated that the more human-like you made a robot, the less people trusted it. Because they still knew it was not a human. People talk about the movie Polar Express. People are just freaked out about the characters in that movie. And we try to create these voice personas, and most of us think they’re weird and most of us don’t trust them. And the more human-like they are, but we still know they’re not human, the less we trust them. That’s another reason to go to multimodal. If you just ask an app to get you the movies, and it gives you the movies, now you’re not evaluating it on that trust basis. You’re evaluated on a cognitive trust basis. Does it do what it says it’s going to do?”