Vector Embeddings - Ethan Young

A **vector** is a mathematical structure with a size and a direction. A **vector embedding** is a technique that allows us to take virtually any data type and represent it as vectors (with an [[Embedding Model]]), allowing computers to understand the semantic meaning of those data types. **NEW!** [[Embedding Encryption]] Embeddings exist in a continuous, high-dimensional space (ref. [[Embedding Projections]]) There are two types of vector embeddings: 1. [[Dense Vectors ("Embeddings")]] 2. [[Sparse Vectors ("Embeddings")]] In the context of [[Information Retrieval (IR) System]], embeddings make it possible to represent both documents and queries as dense vectors in a high-dimensional space and enable [[Semantic Search]]. The [distance](https://platform.openai.com/docs/guides/embeddings/which-distance-function-should-i-use) between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness. See other [[Common Uses of Vector Embeddings]]. See also [[Composite Embedding]], [[Two Tower Embeddings ("Learned Embeddings")]] The Embedding model that generates sparse vectors is different from the Embedding model that generates ordinary dense vectors. The main difference is the distribution of information within them (ref. [Sparse embedding or BM25?](https://medium.com/@infiniflowai/sparse-embedding-or-bm25-84c942b3eda7)). Searching both dense and sparse vectors can be referred to as "hybrid search," but should not be confused with "hybrid search" referring to both semantic and lexical search. https://docs.pinecone.io/docs/hybrid-search Helpful links: - https://www.pinecone.io/learn/vector-embeddings-for-developers/ - https://platform.openai.com/docs/guides/embeddings/what-are-embeddings - https://wfhbrian.com/what-are-embeddings/ - https://huggingface.co/blog/embedding-quantization