Sahaj Garg, co-founder and CTO of Wispr, a voice-to-text AI that turns speech into polished writing, talks with host Amey Ambade about designing systems for the ambiguity that’s inherent in human input (text, voice, multimodal). Sahaj focuses on concrete architectural and training strategies for building robust AI systems. This episode examines the problem of ambiguity, where it shows up, building robust systems, personalization, communicating uncertainty, and evaluation. The conversation starts by exploring the difference between inherent and reducible ambiguity, major categories of ambiguity including lexical, syntactic, and pragmatic, and the additional sources of ambiguity in voice, such as homophones and accents. Garg details how to build systems through model training, including providing additional context and constructing datasets for good annotation. They discuss personalization with a focus on “revealed preferences”—learning from user behavior without explicit feedback—and fighting the problem of AI writing that “regresses to the mean.” Finally, they consider how to communicate uncertainty to users without degrading the experience, as well as methods for evaluating ambiguity resolution through offline and online signals.
Brought to you by IEEE Computer Society and IEEE Software magazine.

