Two Tower Embeddings ("Learned Embeddings")

**TL;DR:** custom bi-encoder ≈ smarter vectors ⇒ simpler, faster hybrid stack; cross-encoder becomes optional polish rather than a crutch. # Two Tower Embeddings ("Learned Embeddings") "Learned Embeddings" is a technique by which interaction patterns never hand-encoded are captured (e.g., lexical relationships like synonyms, brand popularity, seasonal trends) in the actual [[Document Embeddings]] ([[Dense Vectors ("Embeddings")]]) to decrease search time by having a single query embedding already close to its best documents in vector space. That is done by training a custom [[Embedding Model]] ([[Bi-encoder]]) on labeled user click or judgement data (label query + document features pairs as match / no-match; "contrastive loss"). With the new embedding model: 1. Offline: recalculate all [[Document Embeddings]] 2. At runtime: calculate the [[Query Embedding]] In effect: fewer lexical sub-queries → smaller merge → faster, simpler stack. Quick context: enterprise [[Hybrid Search]] systems commonly run ANN calls per sliced query to improve recall, which can get slow. "Learned Embeddings" let us abstract away a few lexical filters (except hard business rules like “in stock”), speeding up search time by using one ANN search instead of N lexical "arms", and a [[Re-ranker]] ([[Cross-encoder]]) may become optional. --- [Doug Turnbull's Blog](https://softwaredoug.com/blog/2025/04/29/two-tower) RECAP: - **Filtered-vector is a necessary stepping stone**, because few production vector stores let you do a _single_ KNN + arbitrary boolean filter efficiently. Hence the “many-arm” pattern as a workaround. slice ➜ vector ➜ fuse ➜ (rerank) - **But the arms multiply** as you bolt on more business logic (brand exact match, compatible-parts, fresh-in-stock…). At some point the complexity and latency again blow up (“more arms, more problems”). - **Solution he advocates:** collapse more signals into the embedding itself (two-tower / bi-encoder trained on judgments). That way you can go back to _one_ fast ANN call **with a modest lexical filter** for safety (e.g., only show documents in the same catalog section).