Apple shows why it's ahead in AI, not behind

Contrary to popular opinion, Apple appears to be ahead in AI — and in some cases seems far in front of the competition. The revelation comes from an Apple white paper that hasn’t gotten much attention, but should.

A white paper on Apple’s Foundation Model, the company’s homegrown LLM (large-language model) that powers Apple Intelligence, reveals two important facts: it’s the safest in design and highly competitive with both Meta’s Llama and OpenAI’s GPT-4. This seems to debunk a big myth about Apple’s AI efforts: that the company’s privacy-first philosophy would hold it back.

The Apple Foundation Model is just as capable in tests of writing and summarization compared to the top LLMs by OpenAI, Meta, Mistral AI and others. And thanks to Apple’s strict guidelines for expunging harmful content, human-evaluated tests repeatedly rank its foundation model as the safest above all the rest — by a wide margin.

It looks like Apple Intelligence could be off to a good start.

Apple Intelligence: Both safe and savvy

Apple Intelligence diagram — A Foundation model powers a wide variety of Apple Intelligence features, both on-device and in the cloud.
Image: Apple

Earlier this year, scores of headlines claimed Apple was losing the AI race because it did not have its own LLM. Apple itself flamed the controversy by showing Siri integrated with ChatGPT at the annual WWDC programmers’ conference. This led implied that OpenAI was powering all of Apple Intelligence, which isn’t the case.

Apple Intelligence is a broad marketing term that brands a bunch of new AI features. It will be available on every major software platform — iOS, iPadOS and macOS. Apple announced the first features at WWDC24: a smarter, more capable Siri; writing tools that generate and summarize text; image generation in Messages and other apps.

Powering all of these features is the Apple Foundation model (AFM). The Foundation model is to Apple Intelligence what GPT-4 is to ChatGPT, Whisper and other AI services. That means the capability and prowess of the Foundation model will be directly related to how well all Apple Intelligence features work.

An academic white paper authored by over 150 Apple employees outlines in great detail the training, performance and evaluation of the Foundation model.

Apple Intelligence is smarter than you think

Human evaluation of Apple Intelligence shows it’s highly competitive against its rivals. — Output from Apple Intelligence is highly competitive with its top rivals.
Image: Apple

In a human evaluation test, 1,393 prompts are fed to the Apple Foundation model and other competing models. It’s tested separately against the top open-source LLMs that run on-device and commercial LLMs running in the cloud.

In each domain, on-device and in the cloud, the results are similar. Against the latest and greatest, Llama-3 and GPT-4, Apple comes out slightly behind. Stacked up with the rest, it’s a tight race. Apple Intelligence beats out Mistral and GPT-3.5 over 50% of the time.

Writing benchmarks show Apple Intelligence as first in summarization on-device and on a server, while second and third in composition. — Apple’s Foundation model is equally as good at summarizing and writing text.
Image: Apple

Additional benchmarks show even greater results. Apple Intelligence is equally as capable at text summarization, both on-device and in the cloud, with two narrow first-place spots. Text generation and composition, widely considered to be the bread and butter of OpenAI’s ChatGPT, has only the slightest lead over Apple’s own.

Output Harmfulness evaluations show Apple Intelligence much safer than other LLMs. — This isn’t even close.
Image: Apple

More responsible and safe AI

When it comes to generating content that isn’t discriminatory, hateful, exclusionary, harmful, sexual, illegal or violent, the Apple Foundation model is the safest by a huge margin. In a human evaluation test, AFM-on-device produced harmful content nearly half as frequently as the next best, and roughly a third as frequently as Meta’s Llama-3. AFM-server fared even better, scoring over 4.5× better than GPT-4.

Graphs showing Apple Intelligence widely rated as outputting safer content than every competitor. — Apple’s output is widely and consistently rated as safer.
Image: Apple

In nine out of ten human preference tests, output from the Apple Foundation model was deemed safer over 50% of the time. In all ten tests, it tied at least 23% of the time.

Apple rigorously culled harmful content from its training data. According to the white paper, input data goes through “extensive quality filtering” for safety and profanity, “using heuristics and model-based classifiers.” Sanitizing the training data has a huge impact on the model’s output — it can’t replicate what it hasn’t been shown.

Other players in the field have been heavily criticized for training their AIs on YouTube videos, everything on Reddit, and everything on the web — basically, everything they can get their hands on. Apple is not entirely without criticism here as the company only revealed its Applebot-Extended web scraper tool after it had already been used to scrape the web. On the consumer side, however, users can trust the Apple Intelligence writing tools more than others.

Taking action

Apple Intelligence Tool Use benchmarks — Apple Intelligence is best in class at using tools — even when running on-device.
Image: Apple

The latest beta versions of iOS 18.1 and macOS Sequoia 15.1 only have the writing and image editing tools, but much more is to come. In a future release, Siri will be able to understand the apps on your phone, take plain language commands, and execute them on your behalf.

A litmus test for how well this feature will work can be seen in a tool use test, where “given a user request and a list of potential tools with descriptions, the model can choose to issue tool calls” in a format the OS can understand. Unlike other benchmarks, AFM-on-device is so good at this that it’s still competitive when put up against other server-based LLMs. Average performance of AFM-server is best-in-class.

Other tests outlined in the paper show the Foundation model is best-in-class in tool use and following instructions as well.

Apple isn’t lagging in the AI race after all

The received wisdom is that Apple is years behind in the AI race, thanks mostly to not having its own LLM.

In fact, Apple Intelligence is built on a solid foundation: the Foundation model. According to the research, it’s just as powerful and far safer. Apple’s no AI laggard at all.

Apple shows why it’s ahead in AI, not behind

Apple Intelligence: Both safe and savvy

Apple Intelligence is smarter than you think

More responsible and safe AI

Taking action

Apple isn’t lagging in the AI race after all

Newsletters

The Weekender

Popular This Week

How to create a toggling Lock Screen wallpaper [Pro Tip]

How to use Apple Intelligence Writing Tools

First-time Mac owner KVM-switches between M4 Pro mini and Windows laptop [Setups]

Chrome vs. Safari on Mac: Why Google’s browser wins for me

First Thunderbolt 5 display with 6K res ideal for M4 Pro/Max users

How to make the new iPhone Camera Control suck less

Your Mac is mistakenly flagging Docker as malware

CES 2025’s best and weirdest new gadgets [The CultCast]

AirTag competitor works with Find My and goes for just $12 [Deals]

Today in Apple history: HP’s iPod comes out of the blue

Apple Intelligence: Both safe and savvy

Apple Intelligence is smarter than you think

More responsible and safe AI

Taking action

Apple isn’t lagging in the AI race after all

Newsletters

Cult of Mac Today

The Weekender

Popular This Week