# Notes on Optimizing for AI-powered Search Ref. Matrixed Ranking Analysis Pipeline in [[AIO Deep Dive]]: > An author outlines a 5-step process: > > 1. Pull rankings for subqueries (using existing SEO tools) > 2. Generate vector embeddings for queries and document passages (using Google's embedding API) > 3. Find most relevant passages per document (using cosine similarity) > 4. Compare passage embeddings to citation embeddings from AI Mode responses > 5. Improve content based on relevance gaps > > From an IR perspective, this reads like someone who understands the theoretical concepts but hasn't built production retrieval systems. > > The core insight—that content needs to be optimized for embedding-based retrieval rather than just keyword matching—is correct. What would a solution look like? It looks like MarketBrew.ai is doing something similar to the proposed "Matrixed Ranking Analysis Pipeline": search all embeddings for a website, view embedding clusters, and search by embeddings to get a comprehensive understanding of how your site is viewed by Google's text classifiers. (ref. [[NewsHack Competitors & Differentiation]]) It looks like some of these companies are taking AIO for a query, splitting by sentence (or per citation), and calculating cosine similarity against your website chunks to predict ... to predict what? "targeted semantic text insertions"? (As another company puts it, whatever that means). This could just be a "new" way of finding content gaps via LSA (Latent Semantic Analysis), which is not new. Is there really any value in this? Is there an easier way to find content gaps, rather than trying to reconstruct Google's system, which is connected to proprietary knowledge sources, and is probably way more complicated than we know? Claude says: > The patent referenced (US11769017B1) describes a **generate-then-verify** approach rather than traditional retrieve-then-generate RAG, with or without HyDE. > > The system isn't asking "what's most relevant?" but "what can verify this specific claim (among reputable sources, ie. ranking sources)?" This is closer to fact-checking. > > The "targeted semantic text insertions" approach misses the point - you want to be authoritative enough that Google's hypothetical documents naturally align with your content, not game similarity metrics. The verification could be tied to Google's knowledge sources. I don't want to say that's impossible to reverse engineer, but it's not easy, and gets convoluted quickly. Would it be better to start from Gemini response, then cosine similarity on company website? - then you're dealing with prompting, and who knows how they do it. - however, it could be used to ID where your content fails to support or verify commonly generated claims, or get preference patterns In the "matrix" solution proposed, I suppose one could benchmark semantic coverage against competitors, but wouldn't that just be copying rather than creating? What are you going to add that's unique? Additionally, could similarity benchmark optimization against competitors/ existing citations lead to convergence on mediocrity? Claude: > **Instead of asking**: "How do I match successful citations?" **Ask**: "What verifiable claims am I uniquely positioned to make?" > > The goal isn't to be semantically similar to existing citations, but to become the **primary verification source** for claims in your domain. This requires original research, unique data, or novel synthesis - not optimization algorithms. The PR, Marketing, and technical SEO worlds are inventing new frameworks and acronyms, not all without merit, and varying adoption among practitioners, but I think this is missing the point. You can't compete by being another website that repeats a study mentioned by a dozen other more reputable websites. --- Trying to reconcile this with my guide: While the my guide accurately identifies that AI systems prefer quality content, many practitioners are interpreting this as "optimize for semantic similarity to existing citations," which this note rightly points out could lead to homogenized, derivative content. Focus on original research, proprietary data, and novel synthesis The suggestion to use embeddings and cosine similarity to find "content gaps" misses the point. This is likely just repackaged LSA that doesn't address Google's actual verification process. "Superior" content is not semantically similar to existing citations, but authoritative enough to verify claims that Gemini wants to make. Additionally, my guide doesn't fully grasp what this note identifes as Google's "generate-then-verify" approach. The system isn't just looking for relevance—it's looking for sources that can verify specific claims among already-ranking, reputable sources. Could this just be another way of looking at RAG, though, legitimately? Build genuine expertise rather than optimizing similarity metrics --- ## What is AI Visibility Optimization? AI Overview and similar Generative Engines are proprietary, complex, and ever-changing. While we know these systems likely do more than rank + summarize, their exact processes remain unknown. Don't waste time over-simplifying these systems into predictable frameworks or attempting to reverse-engineer citation patterns. It distracts from **what actually drives AI Visibility: creating superior content that answers questions better than competitors.** AI _needs_ to cite superior content because it can't find that perspective, unprecedented depth/accuracy, or exclusive supporting data anywhere else. It doesn't have to be 100% novel information—it could be an expert analysis that synthesizes scattered public data or one that confirms established ideas with proprietary data. Consider tracking the following: - **Claim verification rate** → What percentage of your claims are supported by data or sources? - **Question coverage depth** → How thoroughly do you address the sub-questions within your main topics? - **Unique information advantage** → What verifiable claims can you make that competitors cannot? **Bonus:** Conduct content gap analysis by prompting an LLM to identify sub-queries your content should address, then fill those gaps yourself. --- **An LLM generating responses IS the statistical mean**, so using Gemini output as a target is just optimizing toward mediocrity with extra steps. Ask "_What unique, verifiable information do we have that AI systems need for fact-checking?_" Although patents don't necessarily reflect current implementation, [Google's Patents](https://patents.google.com/patent/US11769017B1/en) suggest that AI Overview isn't just pulling the highest-ranking pages for the original query + sub-queries and then generating a summary. It may be generating potential responses and then searching for sources that can credibly verify specific claims within those responses. This is indeed a form of Retrieval Augmented Generation (RAG), but whether the system prefers algorithmic fact-checking or semantic similarity to a Hypothetical Document Embedding (HyDE) is the key content filtering mechanism, no one can know for sure. Some misinterpret these dynamics by treating AI Overview optimization like a matching game—comparing chunked website content embeddings with embeddings of existing AIO citations to find semantic similarities. **This optimization mindset captures the correlation patterns but misses the underlying value mechanism.** You can't out-compete authoritative sources by creating another article that references the same study everyone else mentions, providing the same generic commentary. You need to add more value. --- The connection between "AI generates then verifies" and "therefore you need unique information" isn't established. Why would a generate-then-verify system specifically require unique content more than a search-then-summarize system? Earlier, the document shows AI systems favor quality, relevance, and recency over rankings. Now it claims you need completely unique information—but high-quality, relevant, recent content can succeed without being entirely unique. Rather than trying to reverse-engineer citation patterns --- Here's what I'm trying to say: AI systems, mainly AIO, are proprietary, complex, and ever-changing. We can't over-simplify these systems, and we shouldn't try to reverse engineer citation patterns. Because this is distracting from what really matters: adding value. Valuable content is content that AI _needs_ to cite because it can't find that information anywhere else. It doesn't have to be completely novel information. It could be a comprehensive analysis that synthesizes scattered data, an expert perspective on widely-discussed topics, or new findings backed by proprietary research. --- AI Visibility optimization is about answering as many relevant questions as possible, with a novel perspective, unprecedented depth/accuracy, or exclusive supporting data. It could be a comprehensive analysis that synthesizes scattered data, an expert perspective on widely-discussed topics, or new findings backed by proprietary research.