Query Augmentation - Ethan Young

Query Augmentation methodologies have been developed to increase the performance of the retrievers by transforming the user query pre-encoding. These approaches can further be classified into two categories: - leveraging a retrieval pass through the documents - zero-shot (without any example document). Among the zero-shot approaches, [[Hypothetical Document Embeddings (HyDE)]] introduced a data augmentation methodology that consists in generating an hypothetical response document to the user query by leveraging LLMs. The underlying idea is to bring closer the user query and the documents of interest in the embedding space, therefore increasing the performance of the retrieval process. Their experiments showed performance comparable to fine-tuned retrievers across various tasks. The generated document, however, is a naïve data augmentation in the sense that it does not change given the underlying embedded data for the task at hand, such that it can lead to performance decrease in multiple situations, for there is inevitably a gap between the generated content and the knowledge base. Alternatively, methodologies have been proposed to perform an initial pass through the embedding space of the documents first, and subsequently augment the initial query to perform a more informed search. These [[Pseudo Relevance Feedback (PRF)]] [8] and [[Generative Relevance Feedback (GRF)]] modeling approaches [9] are typically dependent on the quality of the most highly-ranked documents used to first condition their query augmentation to, and are therefore prone to significant performance variation across queries, or may even forget the essence of the original query. In both RAG pipeline enhancements approaches cited above, the retrievers are generally unaware of the distribution of the target collection of documents despite an initial pass through the retrieval pipeline. In a framework developed by Mombaerts et al., for each document and prior to inference, create a set of dedicated metadata, and subsequently generate guided QA spanning across the documents using [[Chain of Thoughts (CoT)]] prompting. **The synthetic questions are then encoded**, and the metadata used for filtering purposes. Create a meta summary which consists of a summarization of the key concepts available in the database for a given filter (aka topic). At inference time, the user query is dynamically augmented by relying on the meta summary. By doing so, we provide the retriever with the capability to reason across multiple documents which may have required multiple retrieval and reasoning rounds otherwise. Related: [[Query Transformation]]