API vs. Browser: Why AI Gives Different Answers Depending on How You Ask
Most AEO practitioners run prompts in a browser and call it a measurement. They're measuring the wrong thing. The gap between browser and API responses is where real brand risk lives.
If you test how AI mentions your brand by opening ChatGPT in a browser and typing a prompt, you're not running an AEO audit. You're running a consumer experience test. The difference matters more than most practitioners realise.
Two different systems
When you access ChatGPT, Claude, or Perplexity through a browser, you're interacting with a product layer that wraps the underlying model. That product layer includes:
- Real-time web browsing (for models with search capability)
- Safety and content filters tuned for consumer use
- Response formatting optimised for chat interfaces
- System prompts from the product team that shape behaviour
- Personalisation based on your conversation history
When a developer accesses the same model through the API — which is how most real-world AI applications are built — they get direct access to the base model with their own system prompts. No browser search integration (unless they add it). No consumer-layer filters. Different temperature settings. Different context windows.
The brand mentions that appear in browser testing often don't match what appears in API-based applications. And most buying decisions that involve AI are made through applications, not browser chat.
Why the gap exists
The browser experience of ChatGPT, for instance, includes a real-time web search integration that can fetch recent content. If your brand has recent positive coverage, the browser version may cite it. The API version, without search integration, relies purely on training data. If your training data representation is weak, the API response will reflect that — regardless of how good your recent content is.
This creates a systematic measurement error in most AEO audits: practitioners optimise for browser responses and assume API responses follow. They often don't.
What we found in real audits
Across a set of audits, we documented the browser-to-API gap for each brand. Key patterns:
- Brands with strong recent content but weak training data representation consistently over-performed in browser tests relative to API tests. The average gap was 34 percentage points in Share of Answer.
- Brands with strong structured data and extensive third-party coverage showed minimal gap — the API and browser results were within 8 percentage points on average.
- Hallucination rates were consistently higher in API responses. The browser's search integration fetches corrective information; the API relies on training data, which may include outdated or incorrect information.
The implication: if you're measuring AEO success using browser testing only, you're likely overestimating your visibility in applications — which is where most real purchasing influence happens.
How to measure correctly
A complete AI Visibility Audit includes both browser and API testing with explicit documentation of the gap. The methodology:
- Run representative prompts through the browser interface of each major AI engine.
- Run the same prompts through the API with a neutral system prompt and default settings.
- Document discrepancies in brand mention, citation quality, and factual accuracy.
- Where gaps are large, identify whether the gap is driven by search integration (recent content advantage) or training data (structural disadvantage).
- Prioritise interventions based on the gap source — content recency versus structural signal architecture.
The interventions are different depending on the gap source. If the gap is driven by lack of recent web content, the fix involves content velocity and distribution. If it's driven by weak training data representation, the fix involves structured data, third-party corroboration, and entity disambiguation — work that takes longer but produces durable results.
The deeper implication
The API/browser gap reveals something fundamental about how AI visibility should be measured: it's not about what the model can find when it looks. It's about what the model believes when it doesn't look.
Training data is the ground truth of model belief. If your brand has weak, ambiguous, or negative representation in training data, no amount of recent content will fully compensate — because most applications don't give the model permission to look. They just ask it to answer.
AEO, properly understood, is the work of changing what the model believes. Not just what it can find.