Query2doc - Ethan Young

Query2doc is a [[Query Expansion]] technique intended to improve sparse and dense vector retrieval, although the gains tend to be diminishing when distilling from a strong cross-encoder based re-ranker. ![[Pasted image 20240911185511.png]] `query2doc` prompts LLMs with few-shot examples to generate pseudo-documents and then integrates with existing sparse or dense retrievers by augmenting queries with generated pseudo-documents. The underlying motivation is to distill the LLMs through prompting. Despite its simplicity, empirical evaluations demonstrate consistent improvements across various retrieval models and datasets. This method is simple to implement and does not require any changes in training pipelines or model architectures, making it orthogonal to the progress in the field of LLMs and information retrieval. https://arxiv.org/pdf/2303.07678 > Experimental results demonstrate that query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets, such as MSMARCO and TREC DL, without any model fine-tuning.