A Research-Backed Guide to Brand Representation in AI Search by Ethan Young Updated July 2, 2025 v5

# A Research-Backed Guide to Brand Representation in AI Search *Ethan Young | Updated July 1, 2025* In May 2024, Google officially launched "AI Overviews" (AIO) in the US. This feature—initially tested as "Search Generative Experience" (SGE) in Google Search Labs throughout 2023—essentially creates "mini SERPs" within traditional search results pages. Since then, AI Overviews have captured 12.8% of all Google search volume—representing billions of monthly queries where users now receive AI-synthesized answers instead of traditional search results. With 16% coverage in the US and accelerating global rollout, according to [Ahrefs'](https://ahrefs.com/blog/insights-from-56-million-ai-overviews/) analysis of 55.8 million AI Overviews across 590 million searches, brands face a fundamental shift in how consumers discover information about their products and services. Almost a year later, on May 20, 2025, at Google's I/O 2025, they announced the U.S. rollout of AI Mode, a dedicated tab enabling AI search, including what they call advanced reasoning, multimodality, and deeper follow-up capabilities. In response, professionals in almost every industry have scrambled to understand how this technology affects their business. In turn, the marketing and public relations industries have created their own frameworks and acronyms—the most prominent being Answer Engine Optimization (AEO), LLM Optimization (LLMO), and Share of Model (SOM), although I've also seen AIO, AEO, GAIO, and AISEO—to help agencies and brands understand and navigate changes in how consumers discover products and services online. But most of these emerging marketing frameworks are oversimplified and suffer from flawed mental models of AI systems. This ultimately leads to confusion and wasted time. This paper cuts through the marketing hype to provide evidence-based strategies for brand visibility in AI-powered search. We'll answer two fundamental questions: - Can we measure and improve brand representation in AI-powered Search? - Do our existing Marketing, SEO, and PR strategies and tactics transfer to AI-powered search? The answers will reshape how you think about digital visibility—revealing why top-ranked pages are losing their guaranteed advantage while lower-ranked content finds new pathways to prominence, and why you need to watch out for accelerated content decay. ## The Ranking Disruption We're witnessing the emergence of a parallel discovery ecosystem where AI systems evaluate and surface content using different criteria than traditional search engines. Success requires understanding and optimizing for both. - [Seer Interactive (2025)](https://www.seerinteractive.com/insights/what-drives-brand-mentions-in-ai-answers) analysis of 10,000 People Also Ask questions across finance and SaaS sectors found strong correlation (~0.65) between Google page 1 rankings and GPT-4o brand mentions. - **Correlations** became even stronger when focusing exclusively on websites that provide solutions rather than just discussing problems (like forums and aggregators). - **Backlinks** showed minimal impact on AI mentions, contrary to expectations from traditional SEO. - [ZipTie.dev (2025)](https://ziptie.dev/blog/seo-still-matters-for-ai-search-engines) analyzed 25,000 real user queries across ChatGPT, Perplexity, and Google AI Overviews to examine traditional search ranking impact on AI visibility and found that higher Google rankings directly correlate with AI search inclusion, with #1 rankings providing a 25% inclusion rate in AI Overviews. The study also pointed to **query fan-out** as a potential reason why some non-top-10 sources still appear in AI results. - At Google I/O 2025 on May 20-21, 2025, Google officially launched AI Mode and explained that it "uses our query fan-out technique, breaking down your question into subtopics and issuing a multitude of queries simultaneously on your behalf." This parallel processing approach could explain why Advanced Web Ranking found that 46.5% of AI Overview citations ranked outside the top 50 organic results [Google I/O 2025: 100 things Google announced](https://blog.google/technology/ai/google-io-2025-all-our-announcements/), as the system retrieves relevant content for specific sub-queries rather than relying solely on overall page rankings. - [Advanced Web Ranking (2024)](https://www.advancedwebranking.com/blog/ai-overview-study) analyzed 8,000 keywords across 16 industries and found that only **33.4%** of AI Overview sources ranked in the top 10 organic results, while **46.5%** of cited URLs ranked outside the top 50 organic results—meaning nearly half of AI Overview citations came from pages that don't even appear on Google's first five pages of traditional search results. The 25% inclusion rate for #1 rankings in AI Overviews shows that while top rankings still provide an advantage, they're no longer a guarantee—and conversely, lower rankings aren't a death sentence for AI visibility. ## The Recency Bias [Seer Interactive (2025)](https://www.seerinteractive.com/insights/study-ai-brand-visibility-and-content-recency/) analyzed 5,000+ URLs cited across ChatGPT, Perplexity, and AI Overviews to examine content recency bias in AI systems and found that: - **Nearly 65% of AI bot hits** targeted content published within the past year (2025) - **89% of AI interactions** occurred on content updated within the last three years (2023-2025) - **Strong industry variations**: Financial services showed extreme recency bias with thousands of hits on 2024-2025 content and almost none pre-2020, while energy content maintained relevance across longer timeframes - **Citation patterns varied by AI system**: AI Overviews showed strongest recency preference (85% of citations from 2023-2025), followed by Perplexity (80%), and ChatGPT (71%). However. ChatGPT cited content dating back to 2004, particularly Wikipedia articles, suggesting that authoritative sources maintain citation value despite age. **We're essentially seeing AI accelerate content decay.** The traditional "evergreen content" model—write once, rank forever—is being replaced by content that requires regular updates to maintain AI visibility. This creates both a challenge (more content maintenance) and an opportunity (fresh content can compete with established players more effectively). ## What Limits Current Approaches to AI-powered Search Optimization The fundamental problem with current marketing approaches to AI-powered search isn't that they're entirely wrong—it's that they're built on oversimplifications. Most frameworks treat "AI search" as a monolithic technology when it's actually several distinct architectures that process and present information differently. This difference isn't just technical—it determines which optimization strategies will actually work. When we examine popular frameworks through this technical lens, we can see both their value and limitations. ### **Share Of Model (SOM)** SOM correctly identifies an important shift—LLMs are changing how information flows—but then oversimplifies the solution. The framework's central premise—targeting LLMs as an "audience"—misunderstands how these systems work. LLMs don't browse the web; they either use pre-trained datasets or retrieval systems. Yet SOM recommends the same SEO tactics (search rankings, structured data, featured snippets) and Google Ads best practices (Broad Match + Smart Bidding) without clarifying whether results should be expected in months (real-time retrieval) or years (future training cycles). Unlike Share of Voice (SOV), which connects to clear optimization pathways through media relations and content strategy, SOM struggles with the non-deterministic nature of LLM outputs. Sampling parameters like temperature and top-p mean the same query can produce different responses, making consistent measurement difficult without very large test sets—a technical reality SOM does not address. Unlike Share of Voice (SOV), which connects to clear optimization pathways through media relations and content strategy, SOM offers confusion masked as innovation by applying traditional digital marketing metrics to systems that operate on entirely different principles. ### **Answer Engine Optimization (AEO)** AEO correctly identifies a fundamental shift in user behavior—from clicking through search results to expecting direct answers from AI systems. The framework's emphasis on question-focused content, clear structure, and concise answers reflects genuine changes in how information is consumed across chatbots, voice assistants, and search features like Google's AI Overview. However, AEO's execution falls into the same trap as other emerging frameworks: applying traditional digital marketing thinking to new technologies. The framework recommends the same optimization tactics—structured headings, bullet points, concise answers, featured snippets and knowledge panels—whether you're targeting ChatGPT (which relies on training data and selectively calls search) and Google's AI Overview (which synthesizes search results in real-time). ### **LLM Optimization (LLMO)** LLMO rightly emphasizes that establishing semantic associations between brands and topics can influence AI outputs, even if it often misrepresents how modern transformer-based language models actually work. The importance of Wikipedia for entity recognition is correctly emphasized. Wikipedia does indeed form a substantial part of most LLM training corpora. A happy side effect of getting your [Wikipedia listings in order]([https://en.wikipedia.org/wiki/Google_Knowledge_Graph](https://en.wikipedia.org/wiki/Google_Knowledge_Graph)) is that you’re more likely to appear in Google’s Knowledge Graph by proxy. The framework's central flaw, however, lies in treating LLMs as knowledge databases rather than predictive text generators. For example, one Ahrefs article I read incorrectly states that "When you ask Claude which chairs are good for improving posture, it recommends specific brands because these brand entities have the closest measurable proximity to the topic of improving posture." This reduces complex neural network operations to a simplistic cosine similarity calculation that doesn't accurately represent how the transformer architecture generates text—imagine thinking a chess grandmaster chooses moves by measuring distances on a board rather than through pattern recognition trained on millions of games. And again, the framework's call-to-action is essentially SEO for the underlying retrieval system, not direct LLM optimization. ## Rethinking Brand Visibility in AI-powered Search Traditional SEO was built around search engines that ranked documents by keywords, links, and structured data. But as LLMs increasingly enter the equation, they introduce new information retrieval, interpretation, and presentation mechanisms. This shift calls for new brand representation strategies. To move beyond the limitations of most modern frameworks, we need optimization strategies grounded in how different AI systems actually work—which means first understanding the distinct architectures that power today's AI search landscape. ### A Taxonomy of Interfaces (and Why It Matters) To develop optimization strategies grounded in technical reality rather than marketing hype, we need to understand how different AI systems actually handle information, and I'm going to dispel the confusion once and for all: 1. **Search Engines** (eg. Bing, Brave, DuckDuckGo, Google) 2. **Large Language Models (LLM)** (eg. Anthropic's Claude 3.5 Sonnet, OpenAI's GPT-4) 3. **LLM-first systems augmented with search engines** (e.g., Claude 4 Sonnet, GPT-4o with browsing, Perplexity AI) 4. **Search-first systems augmented with LLMs** (e.g., Google’s AI Overview and AI Mode powered by Gemini, Bing Chat powered by OpenAI) Search engines and search-first systems update constantly, LLM-first systems can access real-time data but exhibit bias toward training data, and standalone LLMs reflect whatever was cemented during training. Let's go through this in more detail. #### Foundational Systems Before examining how LLMs and search engines combine into something like Google's AI Overview, we need to understand the two core technologies that power modern AI search. A **Large Language Model (LLM)** (eg. Anthropic's Claude 3.5 Sonnet, OpenAI's GPT-4) is a type of Deep Neural Network (DNN) based on the transformer architecture, trained on large corpora of text data—typically including Common Crawl (public web backups) and Wikipedia—to generate, complete, or translate text, answer questions, and perform other Natural Language Processing (NLP) tasks. Think of training data like a snapshot of web data in time. An LLM's knowledge has fixed cutoff dates, after which the it cannot access new information without outside help. Imagine LLMs are like big photo albums that took pictures of the internet before a certain date. These albums include snapshots of websites from Common Crawl (which is like a giant collection of internet photos) and Wikipedia. Once the album is finished, no new pictures can be added. This means that if your website or brand wasn't in those pictures before the album was completed, the AI won't be able to see you when answering questions (without outside help). A **Search Engine** (eg. Bing, Brave, DuckDuckGo, Google) continuously crawls and indexes millions of web pages and, at search, uses a set of public and/or proprietary algorithms to rank the relevancy of websites. The most famous of these algorithms is PageRank, which determines a page's importance by counting the number and quality of backlinks (the underlying assumption is that more important websites are likely to receive more links from other websites). Obviously, search engines return a prioritized lists of links that users must manually navigate. #### Hybrid Systems (Generative Engines) When LLMs and search engines are combined, they can form powerful hybrid systems called Generative Engines (GE) that completely change how information is discovered and presented: **LLM-first systems augmented with search** (e.g., Claude 4 Sonnet, GPT-4o with browsing, Perplexity AI) center around an LLM as the primary reasoning engine. When the system detects that a query requires up-to-date information, it invokes a search API (typically from one of the major search providers). The retrieved results are then reformatted or summarized, passed back into the LLM, then the LLM generates a natural language response—often including citations. **Search-first systems augmented with LLMs** (e.g., Google’s AI Overview and AI Mode powered by Gemini, Bing Chat powered by OpenAI) maintain the traditional search engine pipeline—retrieval, ranking, and display. After identifying the top-ranked results, an embedded LLM generates a natural language summary to present above or alongside standard search results. While optimization strategies largely complement each other across platforms, their relative importance and implementation vary significantly. For example, a brand mention in an authoritative source might influence an LLM's training data, boost traditional search rankings (depending on linking), and serve as source material for hybrid system citations—but the pathways and timeframes differ substantially. Next we'll cover how to adapt optimization approaches across these different system types, moving beyond generic "AI optimization," and toward strategies that account for both traditional search mechanics and generative AI capabilities. ### The Components of AI-powered Search Optimization #### Optimizing for Large Language Models For standalone LLMs without search augmentation, visibility is determined primarily by your brand's digital footprint prior to the model's training cutoff date. Your historical presence on major news outlets and authoritative sites like Wikipedia directly influences LLM representation. For new brands or products launched after an LLM's training cutoff, or those seeking to update their representation, optimization must focus on hybrid systems since pure LLMs cannot incorporate any new information. > Strategy: Focus on authoritative content in persistent public knowledge sources. #### Optimizing for Search Engines Over the past 25 years, website creators and content marketers much smarter than I have extensively researched and experimented tactics and strategies to optimize their web content for search engines. As you know, this discipline is called Search Engine Optimization (SEO) and attempts to address the engine's ranking factors. There are three core sub-fields of SEO: 1. **On-page:** Optimizing the content and structure of your website pages to improve search engine rankings and user experience, including: - Content quality 2. Keyword optimization 3. Internal linking 4. Page speed 5. URL structure 6. Mobile optimization 7. Featured Snippets 8. Knowledge Panels (Google Knowledge Graph & Wikipedia) 2. **Off-page:** Building your website's authority and reputation through activities outside of your website, including: - Backlink building 2. Guest blogging 3. Social media marketing 4. Online reputation management 5. Brand mentions (in authoritative websites) 6. Knowledge Panels (indirectly influenced through brand mentions, Wikipedia entries, and authoritative backlinks) 3. **Technical:** Ensuring your website is technically sound and easy for search engines to crawl, index, and understand, including: - Site speed 2. Website structure 3. Sitemap 4. Robots.txt 5. Schema markup 6. Security There are many SEO nuances, many varying between search engines, that I will not detail here, mostly because it is not my specialty. However, I will briefly touch on technical SEO as it pertains to increasing AI-powered search visibility. ##### Technical SEO Google's web crawler—and by extension, AI Overview and AI Mode—shows strong preference for lightweight websites with faster JavaScript execution. Furthermore, most AI crawlers that LLM-first systems use for up-to-date information—such as OAI-SearchBot, ClaudeBot, and PerplexityBot—don't execute JavaScript at all, creating a significant technical gap between human and AI perception of websites. Solutions can include: 1. **Static JSON Files**: separate schema.json files - creating static, accessible representations of your structured data 2. **Server-Side Rendering (SSR)**: Pre-rendering JSON-LD during the server response ensures all crawlers see it 3. **Dynamic Rendering**: Serving different content to crawlers vs. browsers (though this creates maintenance challenges) 4. **Prerendering Services**: Using services that cache JavaScript-rendered versions of pages for crawlers 5. **Hybrid Approach**: Critical schema in the initial HTML, with enhanced/dynamic schema through GTM > Strategy: Optimize broadly for both human and machine interpretation. #### Optimizing for Generative Engines Whether AI-powered search begins or ends with an LLM—any time search is involved—organic search engine ranking remains critical; Research shows that these systems heavily favor top-ranked search results when retrieving information. For example, Research from Seer Interactive analyzing 10,000 questions found strong correlations (~0.65) between Google page 1 rankings (~0.5-0.6 for Bing) and AI Overview mentions. However, traditional SEO methods are not directly applicable to the LLM layer of Generative Engines. This is because, unlike traditional search engines, generative models are capable of a more nuanced understanding of the user query and context, and can source information from very different places. Therefore, a balanced approach to search engine and LLM optimization is needed. > Strategy: Balance SEO fundamentals with content designed for AI parsing and synthesis. ##### Introducing: Generative Engine Optimization (GEO) The peer-reviewed research paper ["GEO: Generative Engine Optimization" by Aggarwal, Murahari et al.](https://arxiv.org/pdf/2311.09735) provides evidence-based approaches that improve visibility across multiple AI architectures. ###### Research Methodology The research defines "generative engines" (GE) as systems that retrieve relevant documents from a database (like the internet) and use LLMs to generate a response grounded on the sources. The researchers created GEO-bench, a benchmark of 10,000 diverse queries across multiple domains, and evaluated optimization strategies using a simulated generative engine built with GPT-3.5-turbo. They tested nine different content modification approaches and measured their impact using both objective metrics (like position-adjusted word count) and subjective metrics (such as relevance and influence). ###### How to Improve Source Visibility The research identified several content optimization methods—namely concerning credibility and presentation—that significantly improving source visibility in GE responses: 1. **Cite sources** (include citations from reliable sources) 2. **Quotation addition** (incorporate credible quotes) 3. **Statistics addition** (add relevant statistics) 4. **Fluency optimization** (high-quality writing) 5. **Easy-to-understand** (simple language) 6. **No "keyword stuffing"** (i.e., adding more relevant keywords to website content than necessary) Their findings suggest these methods increased visibility by 15-40% compared to baseline content. Notably, and in general, the best combination (Fluency Optimization and Statistics Addition) outperformed any single GEO method by more than 5.5%. ###### Mixing and Matching GEO Methods for the Most Impact The researchers found that different approaches will improve performance in different contexts. For example, making content more authoritative significantly improves performance in debate-style questions and queries related to the “historical” domain, and so on. This makes sense since a more persuasive form of writing is likely to hold more value in debates. Overall, content creators should strive towards making domain-specific targeted optimizations to their content for better AI visibility. ![[Pasted image 20250320165857.png]] **How can we apply the research findings with consideration for both the domain and the specific audience?** Let's say your client writes economic content for financial service professionals. The study suggests the author should focus on **Statistics Addition** (Rank-1 in Law & Gov), combined with **Fluency Optimization** (Rank-1 in Business) and **Cite Sources** (Rank-1 in Statement and Rank-2 in Facts). This creates content that not only resonates specifically with the business professionals consuming financial information but also is more likely to perform well in generative search environments. #### GEO Strategy by Market Position When lower-ranked competitors used the **Cite Sources** method, fifth-ranked websites gained 115.1% visibility while top-ranked websites lost 30.3% visibility on average. **Challenger brands should double down on GEO methods.** If you're not ranking on page one, focus aggressively on creating content with citations, statistics, and expert quotes. These methods can help you compete against higher-ranked competitors in AI-generated responses even when you can't beat them in traditional search rankings. **Established brands need to defend their position.** Your existing SEO advantages won't protect you from visibility loss if competitors start using GEO methods on their content. Add citations, statistics, and credible sources to your highest-performing content before your competitors do. ## Answering Our Core Questions ### **Can We Measure and Improve Brand Representation in AI-powered Search?** Yes, but with important distinctions based on system architecture. #### Measuring AI Representation Traditional SEO metrics remain foundational—organic ranking positions, featured snippets, knowledge panels, and inclusion in Wikipedia and Google's Knowledge Graph directly influence AI visibility. However, measuring AI representation requires an additional layer of tracking. **AI-Specific Measurement Approach:** - Track brand mentions across major AI platforms (Google's AI Mode, Bing AI, ChatGPT) using representative query sets focused on informational searches where AI citations are most frequently triggered - Account for non-deterministic LLM outputs caused by sampling parameters (temperature, top-p) by using adequate sample sizes and regular measurement intervals—making reliable benchmarks expensive but necessary - Use findings for competitive analysis and content gap identification **Bonus:** Integrate PR and Marketing teams. Organizations maintaining artificial boundaries between visibility and credibility risk creating content that performs well by traditional metrics but remains invisible in AI-generated responses. If the visibility and credibility variables of the brand awareness equation are handled by different teams—for example, PR and Marketing, respectively—the most actionable approach might be to develop shared KPIs that reward both visibility (organic rankings, click-through rates) and credibility (share of voice, content quality), such as "authoritative mentions per quarter." **Critical Success Factor:** Connect AI visibility metrics to business outcomes through referrer tracking (Google Analytics 4) to measure actual conversion rates from AI interfaces. While AI visibility measurement methodology remains imperfect, systematic tracking provides actionable insights when tied to revenue impact. #### Three Pillars of Improving AI Representation %%But what if I only Have time to Do Three Things This quarter?%% To improve generative engine representation, brands need both SERP visibility and content credibility. This requires a three-pronged approach targeting different content types: ##### **Layer 1: SEO for Earned Content (Foundation)** Organic search ranking remains table stakes for AI representation since generative engines heavily favor top-ranked results. However, SEO can only be applied to earned content through traditional link-building and authority-building strategies. Then ensure your technical optimization accounts for AI crawler requirements—many AI crawlers don't execute JavaScript, creating gaps between human and AI perception of websites. ##### **Layer 2: GEO for Owned Content (Enhancement)** Generative Engine Optimization (GEO) methods—citations, statistics, expert quotes, and fluent writing—significantly improve the odds of generative engine representation of owned content regardless of ranking position. These techniques can only be applied to owned content and may also gradually influence LLM training data for prominent brands. **Invest in high-quality content if you aren't already.** Create content only you can create. Use proprietary data and expertise to produce quotable expert commentary that competitors can't replicate. AI systems increasingly detect thin content, so focus on decision-oriented material that solves real problems—this type of content remains valuable across evolving interfaces. ##### **Layer 3: Strategic Earned Mentions (Multiplier Effect)** Securing mentions in authoritative sources creates the strongest impact by simultaneously: - Boosting organic search rankings of owned content (if backlinked) - Providing credible source material for generative engines (regardless of backlinking) - More significantly shaping LLM training data over time **Bonus:** Consider multimodal expansion. If budget allows, explore structured video content on platforms like YouTube, which generative engines often treat preferentially when videos have comprehensive metadata and clear organization. **Bottom Line:** Success requires optimizing owned content with GEO methods while earning authoritative mentions that enhance both traditional search performance and AI credibility signals. ### **Do Our Existing Marketing, SEO, and PR Strategies and Tactics Transfer to AI-powered search?** Mostly, but they require rebalancing. **Traditional SEO metrics transfer almost directly** because AI systems heavily favor top-ranked search results. Your existing investments in technical optimization, content quality, and authority building provide the baseline for AI representation. **PR strategies gain unique value.** Earned media's value in AI search lies in its ability to simultaneously influence LLM training data, boost SEO through backlinks, and provide authoritative source material for generative engines. This multi-layered impact gives PR disproportionate leverage compared to other marketing tactics. This creates a unique challenge for PR professionals: you're optimizing for both immediate AI citation opportunities and future training cycles simultaneously. Success requires balancing quick wins through real-time search integration with long-term authority building for future LLM generations. **Press releases might gain unexpected value** as AI systems increasingly rely on them for authoritative information on niche topics. When specialized queries lack coverage, press releases distributed through established networks can become primary sources by default, with their structured formatting. This could transform press releases from simple promotional tools into strategic assets for dominating AI visibility, allowing companies to secure first-mover advantages in emerging markets by crafting releases with relevant statistics and expert commentary before established competitors enter the space. **Traditional advertising metrics face significant challenges** in AI environments where users receive synthesized information rather than clicking through to websites. Click-through rates and cost-per-acquisition may decline as users get answers without visiting websites. Return on ad spend calculations need adjustment—value may come from brand mentions in AI responses rather than direct clicks. **Looking ahead**, as LLMs with larger context lengths become cheaper to train and deploy, future generative engines will likely ingest more sources per query, potentially reducing the impact of organic search rankings while maintaining the importance of content quality and authority. This suggests content quality and earned media investments will compound. ## The Future of Brand Visibility in AI-powered Search While marketing professionals scramble to decode AI-powered search with new frameworks and acronyms, the research reveals something counterintuitive: the increasing complexity of AI search actually reinforces fundamental content quality principles. The six methods that improve AI visibility—citing sources, adding statistics, including expert quotes, optimizing fluency, simplifying language, and avoiding keyword stuffing—read like a journalism professor's syllabus from 1995. AI systems have simply become sophisticated enough to recognize and reward the same editorial standards that have defined credible information for decades. This creates both challenge and opportunity. The technical complexity of different AI architectures—from standalone LLMs to hybrid generative engines—demands new optimization strategies and measurement approaches. But success still hinges on creating content that would impress a skilled editor: well-sourced, clearly written, and genuinely informative. Yet unlike traditional evergreen content that could maintain rankings for years unchanged, AI-powered search operates on an accelerated refresh cycle where content relevance decays faster than ever. The democratization of visibility means your competitors can leapfrog established rankings through superior content quality alone—while content decay acceleration means your authoritative pieces lose influence without regular updates. While search algorithms grow more sophisticated by the month, the brands that will thrive aren't those chasing the latest AI optimization hack—they're the ones building sustainable competitive advantages through content creation, systematic maintenance, and authoritative mentions.