How Often Do AI Search Engines Agree? We Tracked 8 Weeks.

Loudmink tracked 20 queries across 5 AI search engines, 25 B2B SaaS brands, and thousands of citation URLs over 8 research cycles from March to May 2026. The initial finding: AI search engines disagreed on the #1 recommendation in 50% of B2B queries, with full agreement at just 5% for five consecutive weeks. Then, in May 2026, full agreement jumped to 25% and pairwise overlaps surged to 67-80%. If you monitor only one AI search engine, or only check once, you are measuring noise.

This analysis is part of our AEO Research study. Agreement rates that swing from 5% to 25% in a single week make the case for continuous multi-engine monitoring stronger, not weaker. A brand that checked in March saw chaos. A brand that checked in May saw convergence. Neither snapshot tells the full story. This article walks through what changed, what drove the shift, and what it means for your monitoring strategy.

The Bottom Line

AI search engines disagreed on the top recommendation in 50% of B2B queries in our initial research (March 2026). Full agreement held at just 5% for five consecutive weeks before jumping to 25% in May 2026.
Brand website citation rates vary from roughly 3% (Claude) to 23% (ChatGPT) depending on the engine. As of May 2026, the overall average is between 5% and 23%.
Startup citation share has dropped to 9% across all engines, the widest gap observed versus enterprise brands. Initial data showed startups averaging 6.6 mentions versus 16.8 for enterprise.

What "50% Disagreement" Looked Like in March 2026

When five AI search engines answered the same B2B query in our initial research (March 2026), they agreed on the top recommendation only half the time. In Project Management and Analytics categories, disagreement climbed to 75%, meaning three out of four queries produced a different #1 pick depending on which AI search engine your buyer used.

This was not random noise. Each AI search engine has a distinct source ecosystem. ChatGPT links to brand websites in 18-25% of its citations depending on the week. Grok links to brand websites in approximately 8-9% as of May 2026, up from 2% in our initial research, leaning heavily on Reddit instead. Perplexity favors editorial and review sources. These structural differences in sourcing produce structural differences in recommendations.

What to do: Stop treating any single AI search engine as representative. If your monitoring covers only ChatGPT, you are missing what Gemini, Perplexity, Claude, and Grok tell your buyers. Track at least three AI search engines to get a reliable picture of your recommendation landscape. As of May 2026, Loudmink's Pro plan ($299/mo) covers ChatGPT, Gemini, and Perplexity, and the Max plan ($599/mo) covers all five.

The May 2026 Reversal: Agreement Jumped to 25%

For five consecutive weeks, full agreement across engines held at just 5%. Then our May 2026 data showed a dramatic shift: full agreement jumped to 25%, and pairwise overlaps surged to 67-80%, up from 12-47% in prior weeks. The most aligned pair (Perplexity and Grok) reached 80% overlap.

There is an important caveat. This measurement was taken across 4 engines because one engine experienced an infrastructure failure during that research cycle. Fewer engines naturally produces higher agreement scores. The jump from 5% to 25% needs confirmation with all 5 engines reporting.

Even with that caveat, the volatility itself is the finding. Agreement rates do not hold steady. They swing dramatically from week to week. A brand that ran a single audit in March would have concluded that AI search engines are hopelessly fragmented. A brand that ran a single audit in May would have concluded they are converging. Both conclusions would be wrong, because both are based on a single snapshot.

What to do: Do not base strategic decisions on a single monitoring check. Run continuous weekly or biweekly monitoring across at least three AI search engines. If agreement rates can swing from 5% to 25% in a single week, quarterly audits are measuring whatever happened to be true that day, not the underlying trend. Track the trend, not the snapshot.

How Each Engine Picks a Different #1

The disagreement is not abstract. Here is what actually happened when we sent the same queries to all five AI search engines in our initial research. Each cell shows which brand that engine placed at position 1.

Query	Perplexity	ChatGPT	Gemini	Grok	Claude
Best CRM for startups	HubSpot	HubSpot	HubSpot	HubSpot	Salesforce
CRM for B2B sales	HubSpot	Salesforce	Salesforce	Salesforce	Pipedrive
Alternative to Salesforce	Salesforce	Salesforce	Salesforce	HubSpot	Salesforce
PM for engineering teams	Monday.com	Linear	Asana	Monday.com	(none)
What PM software should I use	Asana	Monday.com	ClickUp	ClickUp	Asana
Alternative to Jira	Linear	Linear	Linear	ClickUp	Asana
Email marketing for startups	Mailchimp	ActiveCampaign	Mailchimp	Mailchimp	Mailchimp
Email tool for newsletters	Mailchimp	Beehiiv	Mailchimp	Beehiiv	Mailchimp
Alternative to Mailchimp	Mailchimp	Mailchimp	Mailchimp	Mailchimp	Mailchimp
Analytics for SaaS	Amplitude	PostHog	Amplitude	Amplitude	Mixpanel
Analytics comparison	Mixpanel	Amplitude	Mixpanel	Amplitude	Amplitude
Alternative to GA	GA	GA	GA	GA	GA

A marketing director checking only ChatGPT would think Linear is the consensus winner for Jira alternatives. A director checking only Grok would think ClickUp is the answer. Both would be wrong about the full picture.

Only 4 out of 20 queries produced unanimous agreement across all five AI search engines. The rest ranged from strong consensus (4 out of 5 agreeing) to complete fragmentation (every engine naming a different brand).

What to do: Run your most important queries across at least three AI search engines before making any positioning decisions. If you are building competitive messaging around being "the #1 recommended alternative," verify that claim holds across engines, not just one.

Pairwise Overlap: Which Engines Think Alike (and How Fast That Changes)

Not all AI search engines disagree equally. Some pairs share similar source ecosystems and produce similar recommendations. Others diverge sharply. But the degree of overlap is not stable. Across 8 weeks of tracking, pairwise overlaps ranged from 12% (the lowest ever recorded) to 80% (the highest ever recorded).

Here are the initial pairwise overlap ranges from our March 2026 data compared to the most recent measurements.

Engine Pair	March 2026 Overlap	May 2026 Overlap	Change
Perplexity + Grok	63%	80%	+17
Perplexity + Gemini	71%	73%	+2
ChatGPT + Grok	58%	67%	+9
Perplexity + ChatGPT	64%	69%	+5
ChatGPT + Gemini	62%	68%	+6

The biggest shift was Perplexity and Grok, which moved from 63% overlap to 80%, the highest of any pair. In prior weeks, some pairs dipped as low as 12% overlap. The range across the full study period (12% to 80%) demonstrates that engine alignment is not a fixed property. It shifts as source ecosystems evolve, as engines update their retrieval systems, and as the underlying content landscape changes.

What to do: If budget limits you to two or three AI search engines, pick engines with historically low pairwise overlap to maximize the breadth of your coverage. Monitoring ChatGPT and Grok together has generally given the widest view because they agree the least. But check the current data before committing. Overlap patterns shift, and the pair that was most divergent last month may not be the most divergent this month.

The Citation Source Gap: 5% to 23% Brand Visibility

As of May 2026, brand website citation rates range from roughly 3% (Claude) to 23% (ChatGPT) depending on the engine. The remaining citations come from third-party sources: review aggregators, editorial roundups, community platforms, documentation sites, and comparison pages. Brands that invest exclusively in their own domain are competing for a narrow slice of citation real estate.

Each AI search engine pulls from its own source mix. Here are the brand website citation rates per engine.

Engine	Brand Website Citation Rate	Primary Third-Party Sources
ChatGPT	18-25%	Review sites, editorial content
Perplexity	8-13%	Editorial roundups, high-authority reviews
Gemini	7-14%	Blogs, editorial mentions, reviews
Grok	8-9% (up from 2% in March 2026)	Reddit, third-party reviews, aggregators
Claude	~3%	Aggregators, technical documentation, blogs

ChatGPT is the only AI search engine where brand website content has a meaningful chance of being cited directly. For the other four engines, third-party presence is what drives citations.

What to do: Audit where your brand appears on G2, Capterra, Reddit, and industry-specific editorial sites. Source insights that show where AI search engines pull their answers from make this audit far faster than checking each source by hand. These sources drive the vast majority of AI citations. For ChatGPT specifically, optimize your pricing pages, feature comparisons, and documentation, because ChatGPT links directly to brand websites more than any other engine. For Grok, Perplexity, Gemini, and Claude, prioritize earning reviews and editorial coverage over publishing more content on your own blog.

"Alternative to X" Queries: The Incumbent Advantage

"Alternative to X" queries gave the incumbent product position 1 in 87% of AI search engine responses across our research. The data is remarkably consistent.

Query	Engines where incumbent got #1	Engines where challenger got #1
Alternative to Salesforce	4 of 5 (Perplexity, ChatGPT, Gemini, Claude)	1 of 5 (Grok picked HubSpot)
Alternative to Mailchimp	5 of 5	0 of 5
Alternative to Google Analytics	5 of 5	0 of 5

When a buyer asks "best alternative to Mailchimp," all five AI search engines still place Mailchimp itself at position 1. The query explicitly asks for alternatives, yet the incumbent wins every time. For "alternative to Salesforce," the incumbent held position 1 on four out of five engines, with only Grok choosing HubSpot instead.

What to do: Do not rely on "alternative to" positioning alone. If your growth strategy depends on capturing traffic from competitor-name queries, you need to build independent authority that AI search engines recognize outside the context of comparison. Earn mentions in editorial roundups, build review profiles, and create content that positions your brand as the answer to problem-based queries ("best project management for remote teams") rather than competitor-based queries ("alternative to Asana").

The Startup Visibility Gap Is Widening

In our initial research (March 2026), startups averaged 6.6 mentions across AI search engines versus 16.8 for enterprise brands. That was a 2.5x gap. Since then, startup citation share has dropped to 9%, the widest gap observed across 8 research cycles. The engine coverage gap persists: startups appear on an average of 2.9 out of 5 AI search engines, while enterprise brands appear on all 5.

Not all AI search engines treat startups equally. Here is how each engine handles brand size, based on our initial data.

Engine	Enterprise Mentions per Query	Startup Mentions per Query	Startup at #1 Rate
ChatGPT	1.8	0.8	25%
Claude	1.9	0.7	10%
Gemini	2.1	0.6	5%
Grok	1.9	0.5	5%
Perplexity	2.1	0.5	Low (recent data shows exceptions for niche startups like Beehiiv)

As of May 2026, the startup visibility gap has widened rather than narrowed. Enterprise brands continue to accumulate third-party coverage, review history, and editorial mentions at a faster rate than startups, compounding their advantage in AI search recommendations over time.

What to do: Startups should focus on building presence across the specific sources each AI search engine favors. That means Reddit for Grok, editorial coverage for Perplexity, and structured product pages for ChatGPT. Prioritize ChatGPT first, since it gives startups the best odds of a top recommendation. Track your visibility per engine to identify which AI search engines you are missing entirely, and address those gaps first. The widening gap means acting sooner matters more than acting later.

Which Categories Are Most Fragmented (Updated May 2026)

As of May 2026, Analytics is no longer as fragmented as it was in our initial research. Amplitude has emerged as the dominant recommendation in Analytics with 13 citations across engines. Project Management improved from 25% engine coverage to 75%. The categories have shifted.

Category	March 2026 Fragmentation	May 2026 Status
CRM	25%	Stable. Salesforce and HubSpot still dominate.
Dev Tools	25%	Stable. Vercel holds the lead.
Email Marketing	50%	Stable. Mailchimp remains strong.
Analytics	75%	Consolidating. Amplitude emerged as dominant (13 citations).
Project Management	75%	Improving. Coverage rose from 25% to 75%.

Categories that were wide open in March are starting to consolidate as certain brands build consistent multi-engine presence. The window to claim the #1 AI recommendation in fragmented categories is closing.

What to do: Check fragmentation in your own category. If AI search engines are consolidating around a leader and it is not you, the urgency to act is higher than it was three months ago. If your category is still fragmented, the opportunity remains, but it is shrinking. The brand that builds multi-engine visibility first can claim the consensus position. Build the third-party presence that closes the gap before a competitor does.

What to Do With Each Finding

Every data point from this study maps to a specific action. Here is the complete table.

Finding	What to Do
Agreement rates swing from 5% to 25% week to week	Monitor continuously, not quarterly. Single snapshots are unreliable.
AI search engines disagree on #1 in 50% of queries (initial data)	Monitor at least 3 AI search engines. Single-engine data is a coin flip.
Pairwise overlaps range from 12% to 80% across the study	Engine alignment shifts constantly. Recheck which engines diverge most before allocating budget.
Brand website citation rates range from 3% to 23% per engine	Invest in G2 profiles, Reddit, and editorial coverage. Third-party sources drive the vast majority of citations.
ChatGPT links to brand sites in 18-25% of citations	Optimize your pricing pages, docs, and feature pages. ChatGPT reads your own site more than any other engine.
Grok brand site rate rose from 2% to 8-9%	Grok is shifting, but third-party sources still dominate. Build Reddit and review site presence.
"Alternative to Mailchimp" = Mailchimp at #1 on 5 of 5 engines	Do not build your strategy around "alternative to X" queries. Incumbents win those 87% of the time.
Startup citation share dropped to 9%	Startups: the gap is widening. Prioritize ChatGPT first and build third-party presence aggressively.
Analytics consolidating around Amplitude	If you compete in Analytics, Amplitude is pulling ahead. Act now or cede the AI recommendation.
PM coverage rose from 25% to 75%	Project Management is consolidating. The window for new entrants is narrowing.

Why Single-Snapshot Monitoring Fails

The original version of this article argued that 50% disagreement makes single-engine monitoring unreliable. Eight weeks of data make the case even stronger: the problem is not just that engines disagree, it is that the degree of agreement changes rapidly. A brand that audited in March saw 5% full agreement. A brand that audited in May saw 25%. Both numbers are accurate. Neither tells the full story.

The risk compounds over time. If you optimize for ChatGPT's preferences and ignore Grok's Reddit-heavy source ecosystem, you build visibility on one engine at the expense of others. AI search engine behavior varies significantly, and a monitoring strategy that does not account for this variation produces blind spots that grow wider the longer you operate with incomplete data.

What to do: Adopt multi-engine monitoring as a baseline. Track what AI search engines say about your brand, including its recommendations, position, and citation sources, across at least three AI search engines. Compare per-engine performance weekly to identify which engines are your strongest and which need attention. Budget your content and source-building effort based on where the gaps are, not where you already appear.

Frequently Asked Questions

How often do AI search engines change their top recommendation?

AI search engine recommendations shift frequently, and the rate of change itself is unstable. Loudmink's 8 weeks of tracking show that full agreement rates jumped from 5% to 25% in a single week before the study period ended. Pairwise overlaps ranged from 12% to 80%. Single-snapshot checks are unreliable. Consistent monitoring is the only way to detect and respond to changes.

Which AI search engine is most important to monitor?

No single AI search engine is "most important" across all categories. ChatGPT has the largest user base, but Perplexity and Gemini have distinct source preferences that produce different recommendations. The right answer depends on which AI search engines your buyers use. Start with ChatGPT and expand to at least three engines as quickly as possible.

Do AI search engines favor established brands over startups?

Yes, and the gap is widening. In our initial research (March 2026), enterprise brands averaged 16.8 mentions versus 6.6 for startups and appeared on all 5 AI search engines compared to 2.9 for startups. As of May 2026, startup citation share has dropped to 9%, the widest gap observed. ChatGPT remains the most startup-friendly engine, placing startups at #1 in 25% of queries. Perplexity historically favors established brands but has placed niche startups like Beehiiv at #1 in recent data.

What percentage of AI citations come from brand websites?

As of May 2026, brand website citation rates range from roughly 3% (Claude) to 23% (ChatGPT). The other engines fall between 7% and 14%. The vast majority of citations come from third-party sources like review sites, Reddit threads, editorial coverage, and comparison pages. Building presence on these third-party sources is significantly more effective than publishing more content on your own domain.

Why do Project Management and Analytics have the highest disagreement?

These categories had the highest fragmentation in our initial research because no single brand dominated the review sites, editorial mentions, and community discussions that AI search engines pull from. As of May 2026, both categories are consolidating: Amplitude has emerged as dominant in Analytics with 13 citations, and Project Management coverage rose from 25% to 75%. The window for new entrants is narrowing.

Updated May 2026: Expanded from initial March 2026 findings to include 8 weeks of longitudinal data showing agreement rate volatility (5% to 25%), pairwise overlap ranges (12% to 80%), category consolidation trends, and widening startup visibility gap.

How Often Do AI Search Engines Agree? We Tracked 8 Weeks.

The Bottom Line

What "50% Disagreement" Looked Like in March 2026

The May 2026 Reversal: Agreement Jumped to 25%

How Each Engine Picks a Different #1

Pairwise Overlap: Which Engines Think Alike (and How Fast That Changes)

The Citation Source Gap: 5% to 23% Brand Visibility

"Alternative to X" Queries: The Incumbent Advantage

The Startup Visibility Gap Is Widening

Which Categories Are Most Fragmented (Updated May 2026)

What to Do With Each Finding

Why Single-Snapshot Monitoring Fails

Frequently Asked Questions

How often do AI search engines change their top recommendation?

Which AI search engine is most important to monitor?

Do AI search engines favor established brands over startups?

What percentage of AI citations come from brand websites?

Why do Project Management and Analytics have the highest disagreement?

Related Resources

More research

AI Citations Don't Last: Only 1 in 10 Survives a Quarter

AI Search Keeps Changing: Why Your Visibility Won't Hold

Why AI Cites Different Sites Than Your Google Top 10

Why AI Recommends Newer Companies Over Established Ones

Why AI Recommends the Same Few Brands (and How to Break In)

Perplexity Went From Never Citing Reddit to 90% of Answers