AIO Deep Dive - Ethan Young

Brands ranking on page 1 of Google showed a strong correlation (~0.65) with LLM (GPT-4o API) mentions, according to Seer [Interactive (2025)](https://www.seerinteractive.com/insights/what-drives-brand-mentions-in-ai-answers) Which is more reliable for the AIO frequency stat? https://ahrefs.com/blog/insights-from-56-million-ai-overviews/ https://www.semrush.com/blog/semrush-ai-overviews-study/ AIOs show for 12.8% or more of all searches by volume, according to Ahref's which is second only to Google for the most active web crawler in the SEO industry. AIOs show more than normal for informational queries: - Informational: 97.70% - Navigational: 1.23% - Commercial: 12.86% - Transactional: 2.85% https://ahrefs.com/blog/ai-overviews-reduce-clicks/ We analyzed 300,000 keywords and found that the presence of an AI Overview in the search results correlated with a **34.5% lower average clickthrough rate (CTR)** for the top-ranking page, compared to similar informational keywords without an AI Overview. https://ahrefs.com/blog/websites-with-more-traffic-have-more-mentions/ Citation correlation with traffic (search volume, aka popularity) https://ahrefs.com/blog/ai-search-traffic-conversions-ahrefs/ AI (ChatGPT) search visitors (0.5% of Ahrefs traffic) convert at a 23x higher rate than traditional organic search visitors for Ahrefs. "We’re still processing the data for a larger study across many sites, and we’ll have that information for you soon." https://ahrefs.com/blog/ai-overview-brand-correlation/ - The top 3 correlations with AI Overview brand visibility are all **off-site factors**: brand web mentions (0.664)–linked or unliked–, brand anchors (Brand-rich anchor text) (0.527), and brand search volume (0.392). - Paid factors like branded ad traffic (0.216) and branded ad cost (0.215) show **weak positive correlations** with AI mentions. (## Ads won’t save you if you want AI Overview visibility) - **correlation ≠ causation**. https://www.growth-memo.com/p/what-content-works-well-in-llms 1. Referral traffic from AI Chatbots has a higher quality than from Google (linked to https://www.growth-memo.com/p/the-state-of-ai-chatbots-and-seo) 2. weigh word 3. sentence count 4. Flesch Score 5. **Content depth (word and sentence count) and readability (Flesch Score) have the biggest impact on citations in AI Chatbots** https://www.growth-memo.com/p/query-fan-out according to [Semrush](https://www.semrush.com/blog/semrush-ai-overviews-study/?utm_campaign=google-ai-overviews-13-searches-455057) and [Ahrefs](https://ahrefs.com/blog/insights-from-56-million-ai-overviews/), ~15% of queries show AI Overviews. But the actual number is likely much higher, since we’re not accounting for the ultra-long-tail, conversational-style prompts that searchers are using more and more. AI-based conversational search is no longer matching a single query to a single result. https://ipullrank.com/how-ai-mode-works - "AI Mode introduces ... Zero-click behavior, where being cited matters more than being clicked." - It marks a departure from classical search into a persistent, conversational model of information retrieval. - This stateful context allows Google to reason about _intent over time_ rather than just intent in the moment. - Query fan out includes: "related, implied, and recent queries", then reranked based on the original query - says it's not https://huggingface.co/blog/moe but doesn't say why - how AI Mode works, but not very technical: https://search.google/pdf/google-about-AI-overviews-AI-Mode.pdf - Based on [ZipTie’s latest data](https://ziptie.dev/blog/seo-still-matters-for-ai-search-engines/), ranking #1 for the core query only gives you a 25% chance at ranking in the AIO. - **Fit the Reasoning Target** / **Relevance Engineering** -> don't steal these from him. I'm assuming he would call GEO - built a query rewriter: Based on the initial query, it generated a series of queries, the type of synthetic query, the user intent, and the reasoning behind why the query was selected. - He say's "Google’s embeddings are currently [at the top of the leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and they are the only provider that makes a distinction between query and document embeddings. So, they are the best for this purpose." It's not that query and document embeddings should come from different models, but that they should be compared acknowledging [[Asymmetric Semantic Search]] - rerankes by cosine similarity - Says there's no software for this: The future demands an interface where content optimization happens across multiple surfaces and subqueries simultaneously, with dense retrieval in mind. - - **How they should provide it**: A content editing UI that surfaces passage-level matching against query clusters, with embeddings and ranking overlap visualized. - Says that's why "Python SEO" exists - "Vector Embeddings underpin everything in modern Google. Over the past few years, we’ve uncovered that the system creates vector representations of queries, pages, passages, authors, entities, websites, and now users themselves." Wait till he hears about [[Composite Embedding]]s! - "Just last week, a research paper entitled [“Harnessing the Universal Geometry of Embeddings”](https://arxiv.org/pdf/2505.12540) was released, indicating that all vector embeddings ultimately converge on the same geometry. This suggests that at some point, we’ll be able to convert between embeddings, which means we will be able to generate open source embeddings and convert them into what Google is using." I think there's a misconception there that somehow Google's embeddings are better. Or dimensions. You can't generate OK embeddings with an OK model then transfer them to Google's model to make them GOOD - We're seeing VECTOR ENCRYPTION!! WTF the security implications - There are some misconceptions but does astutely observe: Google’s retrieval model is based on vector similarity. If you don’t understand how your content sits in vector space, you don’t understand how it will be retrieved or cited. - We need an embeddings explorer of the web that reveals site-level, author-level, page-level, and passage-level embeddings, for comparison across the web. We need tools that decompose your content into atomic assertions (triples) and score their retrievability and usefulness across fan-out queries. And finally, we need tools for content pruning based on site focus scoring in alignment with the data from the leak. - I commend him for looking at patents but this gets convoluted - https://rqpredictor.streamlit.app/ - Thinks click stream data is the only way to track traffic paths without GSC - https://marketbrew.ai/ Conversational ("long-tail") queries (like those ChatGPT are used to) likely deviate even more from original query rankings because there are more sub-queries. --- ChatGPT likes news sites. --- What happens if we analyze Common Crawl, the data LLMs are trained on? What's the correlation to AI Visibility? [[Query Expansion]] _**Digital PR**_ What are the projections for CGPT usage beating GOogle? - Pipe impression, click, and conversion data from classic SERPs, AIOs, and AI Mode back into one shared looker-dash. What are the "Other" ranking signals in this pyramid? https://ziptie.dev/blog/seo-still-matters-for-ai-search-engines/ --- https://lp.botify.com/hubfs/White%20paper/2024/Botify%20x%20DemandSphere%20-%20AI%20Overviews%20Report.pdf "Essentially, the closer your content satisfies the same consumer intent and need as the AI Overview’s summary, the more likely it is to be cited and linked." - 120,000 total SERPs - For this analysis, we used Google’s embedding models to vectorize each AI Overview text and the page content found on each URL in the citation links for that AI Overview. Next, we calculated the distance between the overview text on Google’s generated response and the body of text in each one of the top cited links using a standard cosine similarity function. - September 2024 - **Because our study focused on keywords intended to surface AI Overviews, the observed appearance rate is higher than what you’ll see observed across worldwide datasets.** - A full 75% of AI Overview links came from position 12 or higher in the traditional organic rankings. With a median organic rank of position 4 and an average organic rank of position 12.03, it’s clear that the fundamentals are still vitally important. - 75% OF CITED, LINKED WEBSITES IN AI OVERVIEWS CAME FROM THE TOP 12 ORGANIC RANKINGS But how is that useful? The AIO might not have been generated yet. Also, how often are AIOs updated? > As we mentioned earlier, AI Overviews are compiled from two sources: Google’s LLM, known as Gemini, and Google’s organic web index. Generating an AI Overview uses the following process, according to Google’s patent: > ➀ A consumer inputs a query into Google search > ➁ Google uses a natural language response system to understand the context and meaning behind the query > ➂ Google provides a generative summary using data from within their own LLM. This isn’t based on specific context and links from the SERP, but rather what Google’s AI model knows about a subject based on all the content it trained on in the past > ➃ Google consults top-ranking websites in its search engine index to find content that best answers the query, linking to these websites as sources within the AIO That sounds like [[Hypothetical Document Embeddings (HyDE)]]. AIO Question: So is Gemini creating a hypothetical document embedding, then comparing [[Cosine Similarity]] to the ranked pages, not just for the original query, but also several sub-queries ([[Query Expansion]])? https://patents.google.com/patent/US11769017B1/en. Here's how it works: 1. Receive a query 2. LLM generates a hypothetical answer (potentially based on Google's knowledge sources) 3. Search engine retrieves ranked content sources for query, sub-queries, and recent queries, OR simply answers the query if external data not needed 4. System selects a portion of the hypothetical summary (suggesting this is a long, detailed document and we want to render only what we can verify) 5. System determines citations by comparing portion of hypothetical document with portions of retrieved documents - these "verify" the hypothetical document. Uses embeddings to match summary content with source material 6. System asks itself: Does this citation verify the portion? - Yes: link (You just got cited in AI Overview) 2. No: Do we need to retrieve additional documents? - Yes: go to step 5 2. No: Do we need additional portions? - Yes: go to step 4 2. No: render (THE END) They also show changes in this system, updating the AIO depending on if you interact with the rendered results/SERP. It updates the AIO to reflect that you are familiar with certain sources/content. The diagrams suggest AIO will be updated to reflect information NEW information. --- https://sandboxseo.com/generative-engine-optimization-experiment/ The proposal for a "Matrixed Ranking Analysis Pipeline" for SEO in AI search environments. The author outlines a 5-step process: 1. Pull rankings for subqueries (using existing SEO tools) 2. Generate vector embeddings for queries and document passages (using Google's embedding API) 3. Find most relevant passages per document (using cosine similarity) 4. Compare passage embeddings to citation embeddings from AI Mode responses 5. Improve content based on relevance gaps From an IR perspective, this reads like someone who understands the theoretical concepts but hasn't built production retrieval systems. The core insight—that content needs to be optimized for embedding-based retrieval rather than just keyword matching—is correct. What would a solution look like? [[NewsHack Competitors & Differentiation]] [[What is AI Visibility Optimization?]]