Исследователи компании Apple предлагают мультимодальный подход искусственного интеллекта к распознаванию речи на устройствах с использованием больших языковых моделей.

 Apple Researchers Propose a Multimodal AI Approach to Device-Directed Speech Detection with Large Language Models

Apple’s latest breakthrough in virtual assistant technology revolutionizes human-device interactions by eliminating the need for trigger phrases, enabling more natural and spontaneous dialogue.

This innovation leverages a multimodal AI approach that combines acoustic data, linguistic cues, and outputs from automatic speech recognition systems to understand and categorize speech directed at a device.

The system has shown remarkable improvements, with up to 61% error rate reduction over audio-only models, paving the way for more natural interactions with virtual assistants.

This research significantly enhances human-device interaction, making it more intuitive and akin to human-to-human communication, fundamentally changing our relationship with technology.

