AI

ReALM Apple AI can identify on-screen reference and understand background information

Published

1 year ago

April 2, 2024

Apple has been researching a new large language model called ReALM which can detect on-screen reference and understand its context.

ReALM (Reference Resolution As Language Modeling) works on the identification of references with unknown context or background information to provide details to the end user.

Once combined with a voice assistant, the model could bring a massive improvement in general conversation.

In the background, ReALM reconstructs the screen using parsed on-screen data and locations to generate a text-based representation that captures the visual layout (via VentureBeat).

The researchers have improved the model data to understand the reference at different resolutions. It could have the capability to outperform ChatGPT-4.

The researcher notes that the ReALM has been optimized over an existing system with similar features. Its smallest model has improved by 5 percent to identify on-screen references.

It is also added that the model may surpass the GPT-4 level of data processing. Despite its advantages, the model also has some limitations over the number of applications.

There’s no confirmation whether Apple will implement this model in its Siri voice assistant. However, the company kept on pushing boundaries to compete against its AI rivals. In that case, it is researching new AI solutions to put them in devices.