Okapi BM25 - Ethan Young

BM25 is a fundamental non-parametric lexical method that calculates document relevance using term frequency (TF) and inverse document frequency (IDF) ([[TF-IDF]]) Okapi BM25, also known as "Best Match 25", is an exact matching algorithm that enables fast and efficient document retrieval. If you have a large corpus and the collection of documents is within the same domain, BM25 can be used as a strong baseline. It advances beyond earlier probabilistic models by incorporating document length normalization and non-linear term frequency scaling, thereby enhancing the alignment of queries with documents. BM25 is a [[Bag-of-words (BoW)]] retrieval function that ranks a set of documents based on the query terms appearing in each document, regardless of their proximity within the document. It is a family of scoring functions with slightly different components and parameters. BM25 is just one of a bazillion lexical similarity scores, aka [[Sparse Vectors ("Embeddings")]]. Variants/alternatives: - [[BM25S]] - [[Rank-BM25]] - [[Elasticsearch]] - [[Pyserini]] - [[BM25-PT]] - [[SPLADE]] ==BM25 with k1 = 1.2 and b = 0.75== works well in most cases. [Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants](https://link.springer.com/chapter/10.1007/978-3-030-45442-5_4) > In summary, this work describes a double reproducibility study—we methodologically validate the usefulness of databases for IR prototyping claimed by Mühleisen et al. and performed a large-scale study of BM25 to confirm the findings of Trotman et al. Returning to our original motivating question regarding the multitude of BM25 variants: “Does it matter?”, we conclude that the answer appears to be “no, it does not”. [Improvements to BM25 and Language Models Examined](https://www.cs.otago.ac.nz/homepages/andrew/papers/2014-2.pdf): > It shows that stop words are ineffective, that stemming is effective, that [[relevance feedback]] is effective, and that the combination of not stopping, stemming, and feedback typically leads to improvements on a plain ranking function. However, there is no clear evidence that any one of the ranking functions is systematically better than the others. Comparison of large numbers of ranking functions is exploratory in nature due to the number of observed effects, but we found no one ranking function consistently outperforming the others.