The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.
– Mark Weiser
Many of us grew up watching Star Trek, where the crew could simply speak to the computer and it would understand not just their words, but their intent. “Computer, locate Mr. Spock” wasn’t just about voice recognition – it was about comprehension, context, and action. This vision of ambient computing, where the interface disappears and interaction becomes natural (speech, gestures, and so on), has been a north star for scientists and builders for decades.
The computing research foundation for making this vision a realty was laid in 1988 by Mark Weiser of Xerox Parc when he coined the term Ubiqutious Computing. Mark together with John Seely Brown defined the concept of Calm Computing having these attributes:
- The purpose of a computer is to help you do something else.
- The best computer is a quiet, invisible servant.
- The more you can do by intuition the smarter you are; the computer should extend your unconscious.
- Technology should create calm.
When Amazon launched Alexa in 2014, we weren’t the first to market with voice recognition. Dragon had been converting speech-to-text for decades, and both Siri and Cortana were already helping users with basic tasks. But Alexa represented something different – an extensible voice service that developers could build upon. Anyone with a good idea and coding skills could contribute to Alexa’s capabilities.
I remember building my first DIY Alexa device with a Raspberry Pi, a $5 microphone and cheap speaker. It cost less than $50 and I had it working in less than an hour. The experience wasn’t perfect, but it was scrappy. Builders were excited by the potential of voice as an interface – especially when they could build it themselves.

However, the early days of skill development weren’t without challenges. Our first interaction model was turn-based – like command line interfaces of the 1970s, but with voice. Developers had to anticipate exact phrases (and maintain extensive lists of utterances), and users had to remember specific invocation patterns. “Alexa, ask [skill name] to [do something]” became a familiar but unnatural pattern. Over time, we simplified this with features like name-free interactions and multi-turn dialogs, but we were still constrained by the fundamental limitations of pattern matching and intent classification.
Generative AI allows us to take a different approach to voice interfaces. Alexa+ and our new AI-native SDKs remove the complexities of natural language understanding from the developer’s workload. The Alexa AI Action SDK, for instance, allows developers to expose their services through simple APIs, letting Alexa’s large language models handle the nuances of human conversation. Behind the scenes, a sophisticated routing system using models from Amazon Bedrock—including Amazon Nova and Anthropic Claude—matches each request with the optimal model for the task, balancing the requirements for both accuracy and conversational fluidity.
This shift from explicit command patterns to natural conversation reminds me of the evolution of database interfaces. In the early days of relational databases, queries had to be precisely structured. The introduction of natural language querying, while initially met with skepticism, has become increasingly more powerful and precise. Similarly, Alexa+ can now interpret a casual request like “I need some rustic white picture frames, around 11 by 17” into a structured search, maintain context through refinements, and execute the transaction – all while feeling like a conversation you’d have with another person.
For builders, this represents a fundamental shift in how we build voice experiences. Instead of mapping utterances to intents, we can focus on exposing our core business logic through APIs and let Alexa handle the complexities of natural language understanding. And for services without externalized APIs, we’ve added agentic capabilities that allow Alexa+ to navigate digital interfaces and spaces as we would, significantly expanding the tasks it can accomplish.
Jeff’s vision was to build the Star Trek computer. Ten years ago that was an ambitious goal. We’ve come a long way since then – from basic voice commands to much more conversational interfaces. Generative AI is giving us a glimpse of what’s possible. And while we aren’t flying around in voice-powered spaceships yet, the foundational technical problems of natural language understanding and autonomous action are becoming tractable.
The Alexa+ team is accepting requests for early access to the AI-native SDKs. You can sign up here. Ten years in, and I’m as excited as ever to see what builders will dream up.
As always, now go build!