Measuring AI perception

Start where your instincts already win

If a brand has spent years on content, PR, reviews, and analyst relations, the awareness piece is largely solved. The frontier models read the internet during training: your site, your G2 reviews, your comparison pages, the podcasts your founder went on. They know you exist and find you for category questions.

Getting found is real work, and it's the entry point. It rewards the instincts an SEO or growth practitioner already has: crawlable content, structured data, and presence in the third-party sources models weigh heavily (review sites, community threads, comparison content). We even wrote a guide on it, the 80/20 Guide to AI Visibility. Those instincts matter most at the step where the model decides what to read about you.

The open question is what it says once it finds you

Visibility is mostly a one-time fix. What stays open is alignment: when the model finds your brand, what does it actually say? On which dimensions does it weigh you against alternatives? Is the answer moving the buyer toward you or toward a competitor?

That's where almost every positioning gap, perception gap, and competitive-defensibility issue an established brand has actually lives. It's a multi-year strategic discipline, and it looks more like brand strategy than search work: positioning, proof, and content aimed at moving specific perceptions, on specific dimensions, for specific buyers.

Two questions, two kinds of tools

A visibility or prompt tracker and a perception platform are built to answer different questions. Both are useful, for different jobs.

	Prompt / visibility trackers	Unusual AI brand management
The question	Across a set of prompts I choose, how often am I mentioned, and where?	When a real buyer reasons toward a decision, what does the model conclude about me, and what's driving it?
What it measures	Mentions, citations, and share of voice across the tracked prompts	The stable belief the model holds across natural buyer conversations, scored on whether the model finds you and whether it recommends you in a given context, and why, then traced to its cause
Best used for	Monitoring presence over time; a fit for teams with established, SEO-style reporting processes	Diagnosing what the model thinks, and changing it

Trackers watch the score, and the score itself is shaky: swap a single meaning-preserving word in a tracked prompt and the leaderboard can reorder (the problem with prompt tracking). Google is starting to surface some of this data directly (AI citations are beginning to appear in Search Console); short of that, a tracker only sees the set of prompts you choose to follow.

How we measure it

Start from real buyers. Personas and the situations they're actually in, defined with the brand.
Simulate the buying conversation. We run those personas through buying conversations with the frontier model from each major lab, web search on.
Score two distinct behaviors. How readily the model finds you in a relevant conversation, and how readily it recommends you once it does, on qualitative scales, broken out by topic, by evaluation criterion, and against named competitors.
Trace each belief to its cause. For every perception worth noting, we open the reasoning and find what drove it: which sources the model leaned on, which framings recur, what evidence was missing or contradictory. The driver is the finding.
Test before you publish. We can intercept a draft page or a new proof point into the model's research and measure whether it actually moves perception, before you commit to building it.

We optimize for the top of the stack: if the smartest model misreads you, every smaller one is making a worse version of the same mistake.

Why a tracker can show different numbers than we do

This is the question we get most from teams already running a tool. The short answer: the two are measuring different things, so they should read differently.

Different prompts. A tracker reports on the prompts you configured. Model answers swing dramatically on small wording changes, so a fixed prompt set captures a narrow, unstable slice. The results reflect the prompts you chose as much as the model's view of you.
Different models and sample. We run many conversations across the frontier models with search on. A different tool, prompt set, or date lands on different specifics by design.
Different question entirely. One counts straight mentions across the prompts you track. The other measures how the model shapes buyers: teaching them how to find and evaluate solutions, and where you land when it does, and why.

A note on citations, since they're the most tempting thing to compare directly: individual citations are noisy. The Wall Street Journal reported that 40–60% of the domains AI cited for identical questions were completely different a month later (our read on that piece). So we read source-type patterns (case studies, third-party reviews, comparison content) alongside the model's reasoning, rather than chasing single URLs.

Neither approach is wrong. One tells you what's being said across the prompts you track; the other tells you what the model believes, and what you'd change to move it.

What you do with the answer

The payoff is changing what the model reasons over: sharper positioning, more legible proof, presence in the authority sources the model already trusts, and clean documentation it can retrieve. Then you re-measure. A single well-placed claim, backed by evidence, can move recommendations more than a hundred new pages.

Measuring whether it worked (attribution)

AI's influence usually lands before the click. A buyer asks an assistant, gets steered, and arrives already decided, so the assistant rarely shows up as a clean line in your analytics. Measuring impact means triangulating a few signals rather than reading one number:

The perception data itself. Re-run the same buyer conversations over time and watch whether the model finds and recommends you more often, and whether the framing improved. It's the most direct read, and it moves before pipeline does.
AI-referred and branded demand. Referral traffic where assistants pass a link, branded-search lift, and direct visits that climb after buyers have "already done their research."
What buyers tell you. The "I asked an AI assistant and it recommended you" moments in sales calls and intake forms: qualitative, but often the earliest and clearest signal.

No single metric proves it. You build confidence by lining up movement in perception with movement in demand. UTM tracking and lead form-fill responses (“Did AI play a role in how you found or evaluated us?”) help paint a partial picture. For the full playbook, see Tracking your AI-influenced cohort.

Measuring AI perception.