You’re reading the Apple Newsroom

Apple researchers unveils new AI system for enhanced voice assistant interactions

Abdul Raouf Al Sbeei
Abdul Raouf Al Sbeei - Apple Reporter
3 Min Read

Apple researchers recently unveiled an advancement in artificial intelligence designed to improve voice assistant interactions with the introduction of ReALM (Reference Resolution As Language Modeling) which tackles a key challenge: understanding user references to what’s on their screen (via. VentureBeat).

Voice assistants usually struggles with interpreting ambiguous user commands, particularly those referencing visual elements on a device’s display. ReALM tackles this hurdle by leveraging the power of large language models. These models analyze the on-screen content and contextualize user queries, enabling them to pinpoint the specific information being referenced.

Being able to understand context, including references, is essential for a conversational assistant. Enabling the user to issue queries about what they see on their screen is a crucial step in ensuring a true hands-free experience in voice assistants.

Apple research team

This innovation hinges on ReALM’s ability to reconstruct the user’s screen. By parsing on-screen elements and their locations, it generates a textual representation that captures the visual layout. This allows ReALM to translate visual information into a language model’s familiar territory. This approach, combined with fine-tuned language models, surpasses existing systems like GPT-4 in understanding screen-based references.

The benefits extend beyond convenience. ReALM paves the way for a truly hands-free experience. Users can interact with their devices seamlessly, issuing voice commands directly related to what they see on the screen. This is particularly valuable for visually impaired users or situations where touching the device is impractical.

Apple researchers acknowledge the limitations of this technology. ReALM relies on automated parsing, which can struggle with complex visual references, like distinguishing between multiple images. Future iterations might incorporate computer vision and multi-modal techniques to address these challenges.

Apple’s upcoming Worldwide Developers Conference (WWDC) on June 10 is expected to serve as a platform for showcasing its AI advancements alongside iOS 18, a major update for iPhones. Speculation also suggests the unveiling of a new large language model framework, an “Apple GPT” chatbot, and a broader integration of AI features within their ecosystem.

TOPICS:
Share this Article

Editor's Pick

Supercharged is not just another news outlet. We’re a platform on a mission to offer personalized and ad-free news directly to you. Discover more of Supercharged.

You’re reading the Apple Newsroom

  • Loading stock data...

Apple researchers unveils new AI system for enhanced voice assistant interactions

Abdul Raouf Al Sbeei
Abdul Raouf Al Sbeei - Apple Reporter
3 Min Read

Apple researchers recently unveiled an advancement in artificial intelligence designed to improve voice assistant interactions with the introduction of ReALM (Reference Resolution As Language Modeling) which tackles a key challenge: understanding user references to what’s on their screen (via. VentureBeat).

Voice assistants usually struggles with interpreting ambiguous user commands, particularly those referencing visual elements on a device’s display. ReALM tackles this hurdle by leveraging the power of large language models. These models analyze the on-screen content and contextualize user queries, enabling them to pinpoint the specific information being referenced.

Being able to understand context, including references, is essential for a conversational assistant. Enabling the user to issue queries about what they see on their screen is a crucial step in ensuring a true hands-free experience in voice assistants.

Apple research team

This innovation hinges on ReALM’s ability to reconstruct the user’s screen. By parsing on-screen elements and their locations, it generates a textual representation that captures the visual layout. This allows ReALM to translate visual information into a language model’s familiar territory. This approach, combined with fine-tuned language models, surpasses existing systems like GPT-4 in understanding screen-based references.

The benefits extend beyond convenience. ReALM paves the way for a truly hands-free experience. Users can interact with their devices seamlessly, issuing voice commands directly related to what they see on the screen. This is particularly valuable for visually impaired users or situations where touching the device is impractical.

Apple researchers acknowledge the limitations of this technology. ReALM relies on automated parsing, which can struggle with complex visual references, like distinguishing between multiple images. Future iterations might incorporate computer vision and multi-modal techniques to address these challenges.

Apple’s upcoming Worldwide Developers Conference (WWDC) on June 10 is expected to serve as a platform for showcasing its AI advancements alongside iOS 18, a major update for iPhones. Speculation also suggests the unveiling of a new large language model framework, an “Apple GPT” chatbot, and a broader integration of AI features within their ecosystem.

TOPICS:
Share this Article
Secured By miniOrange