Published on

AI now stands for Apple Intelligence


Bing Image Creator prompt (suggested by Claude 3.5): “A sleek, modern iPhone floating in space, with a glowing blue AI brain or neural network emerging from its screen. The AI network should extend beyond the phone's boundaries, symbolizing edge computing. In the background, faint outlines of cloud servers are visible, but less prominent than the phone. Use a dark background with blue and white accents to create a futuristic, high-tech feel. Include small icons representing various AI tasks (text, image, voice) orbiting the phone. The overall composition should convey the idea of powerful AI capabilities contained within a handheld device”

Apple’s WWDC 2024 was a few weeks ago, and for me, the most exciting part was their announcement of generative AI tools coming to iPhone. The Keynote presentation is smooth, and you can see a slick overview of what the new “Apple Intelligence” tools can deliver here.

After all the marketing and high-level hoopla, we were given a technical blog post by the ML research team at Apple to accompany the Apple Intelligence announcement https://machinelearning.apple.com/research/introducing-apple-foundation-models. In the post, they give some more tech details about what exactly is powering their AI. That entire post is really worth a read: they talk about the overall architecture of Apple Intelligence, how it performs, why it's secure, and how it can maximize the usefullness of on-device LLM inference.


Apple's position as both hardware and software provider allows for a unique approach to their AI implementation. They’ve trained a ~3 billion parameter LLM designed to run directly on iOS devices. This model utilizes Low-Rank Adaptation (LoRA) with a mixed 2-bit and 4-bit precision, as well as Grouped Query Attention. I’ve mentioned LoRA in previous posts: it’s a method to be able to adapt an LLM using fewer parameters by training smaller matrices alongside of specific parts of an LLM (typically, the attention and feed forward layers) (LoRA Paper). Although they don’t cite any research about how exactly they implemented the 2-bit and 4-bit palletization, you can see some docs in the Apple CoreML documentation that is probably relevant. My understanding is that it’s a little bit more complex of a quantization technique than something like QLoRA, which is quantizing all weights to 4-bit / 8-bit representations instead of the clustering/centroid approach for quantization like palletization is doing.

These lighweight individual LoRA adapters are trained for specific tasks like “proofreading” or “mail reply” and can be applied to the 3B LLM for quick and flexible on-device performance. Although they don’t release the LLM or any adapters, what’s exciting about this information is that there are open source tools already available to train and develop a similar system, like the Huggingface PEFT library.

The on-device model demonstrates some pretty solid performance metrics, with a first-token latency of 0.6 milliseconds per prompt token and generation speeds of 30 tokens per second.

Edge Computing: Advantages and Limitations

Apple's focus on edge AI presents an advantage over cloud-based solutions offered by competitors like OpenAI, Microsoft, Google, and Amazon.

On-device processing can produce faster response times, better privacy, and offline functionality. However, this approach also faces some limitations in terms of model size and computational power compared to cloud-based solutions. As the Apple ML Blogpost makes clear, the small on-device model still isn't competitive with something like GPT-4, which is the reason Apple also hosts a cloud-based LLM and announced an optional integration with ChatGPT for users.

Implications for the Future of AI

We've been waiting for it seems like forever to get a smarter Siri and some AI tools integrated into Mac software. When Siri rolled out in 2011 we expected it to be the helpful assistant for everything, but it is generally something I use for setting timers and sending a quick text message from my watch. I'm hopefully that Siri is about to get a lot more useful.

Apple's introduction of on-device AI represents a significant step in making AI more accessible and integrated into their products: as the field progresses, it will be interesting to see how Apple's approach to AI integration evolves and compares to advancements made by other industry leaders. Nobody else has the control over hardware and software like Apple does, and if they can push to quickly iterate and improve their local LLM performance and capability, it's going to be a strong differentiator for their devices in the market.