Embeddings & Vector Search

What Are Embeddings?

An embedding converts text into a list of numbers (a vector) that captures its meaning.

"I love cats"  → [0.23, -0.45, 0.89, 0.12, ..., -0.34]  (1536 numbers)
"I adore cats" → [0.24, -0.44, 0.88, 0.13, ..., -0.33]  (very similar!)
"Buy stocks"   → [0.91, 0.02, -0.56, 0.77, ..., 0.45]   (very different)

Key insight: Similar meaning = similar vectors. Unrelated meaning = distant vectors.

Analogy: GPS coordinates for meaning. "Paris" and "Lyon" have close coordinates (both in France). "Paris" and "Tokyo" have distant coordinates. Embeddings do the same but for meaning in 1000+ dimensions.

The Classic Example

king - man + woman ≈ queen

Vector math:
  [0.5, 0.8, 0.3] - [0.4, 0.1, 0.5] + [0.6, 0.2, 0.7] = [0.7, 0.9, 0.5]
  Closest vector to [0.7, 0.9, 0.5] in vocabulary → "queen"

The model learned that the relationship between "king" and "man" is similar to the relationship between "queen" and "woman" — without anyone teaching it this explicitly.

How Similarity Is Measured

Cosine Similarity

Most common metric. Measures the angle between two vectors (ignores magnitude).

similarity = (A · B) / (|A| × |B|)

Result range:
  1.0  = identical meaning
  0.0  = unrelated
  -1.0 = opposite meaning

Example:

sim("I love cats", "I adore kittens")  = 0.95  (very similar)
sim("I love cats", "The weather is nice") = 0.15  (unrelated)
sim("I love cats", "I hate cats")      = 0.65  (related topic, different sentiment)

Other Distance Metrics

Metric	How It Works	Used For
Cosine similarity	Angle between vectors	Most text similarity tasks
Euclidean distance	Straight-line distance	When magnitude matters
Dot product	Raw dot product	Optimized retrieval

Embedding Models

Model	Dimensions	Provider	Notes
text-embedding-3-small	1536	OpenAI	Good quality, cheap
text-embedding-3-large	3072	OpenAI	Best quality from OpenAI
embed-v4	1024	Cohere	Good multilingual
BGE-large-en	1024	Open Source (BAAI)	Best open-source
E5-mistral-7b	4096	Open Source	Instruction-following embeddings
Voyage-3	1024	Voyage AI	Strong for code

Pricing: Very cheap compared to LLMs. OpenAI text-embedding-3-small: $0.02 per 1M tokens.

Vector Databases

Regular databases search by exact match or range. Vector databases search by similarity.

Traditional DB:
  SELECT * FROM products WHERE category = 'shoes' AND price < 100

Vector DB:
  "Find me products similar to 'comfortable running shoes for rainy weather'"
  → Returns products with vectors closest to the query vector

Popular Vector Databases

Database	Type	Best For
Pinecone	Managed cloud	Production, zero-ops
Weaviate	Open source + cloud	Hybrid search (vector + keyword)
Qdrant	Open source + cloud	High performance, filtering
Chroma	Open source	Prototyping, simple local use
pgvector	PostgreSQL extension	Already using PostgreSQL
Milvus	Open source	Large-scale (billions of vectors)
FAISS	Library (Meta)	In-memory, fastest for research

How Vector Search Works (ANN)

Searching ALL vectors for the closest match is O(n) — too slow for millions of vectors.

Approximate Nearest Neighbor (ANN) algorithms:

HNSW (Hierarchical Navigable Small World):

Build a multi-layer graph:
  Top layer:    few nodes, long-range connections (navigate quickly)
  Middle layers: more nodes, medium connections
  Bottom layer:  all nodes, short-range connections (precise search)

Search: Start at top → hop through graph → drill down → find nearest

~95-99% recall (finds true nearest neighbor 95%+ of the time)
Sub-millisecond search over millions of vectors
Used by: Qdrant, Weaviate, pgvector

IVF (Inverted File Index):

Cluster vectors into groups
At query time, only search the closest clusters
Faster but lower recall than HNSW

The RAG Connection

Embeddings + vector search = the foundation of RAG (Retrieval-Augmented Generation).

Setup (one-time):
  Your documents → Split into chunks → Embed each chunk → Store in vector DB

At query time:
  User question → Embed question → Search vector DB → Get top 5 similar chunks
  → Send chunks + question to LLM → LLM answers using the chunks

User: "How do I configure the payment webhook?"

1. Embed question → [0.12, 0.89, ...]
2. Vector search → finds chunks from payment-docs.md and webhook-guide.md
3. Send to Claude: "Using this context: [chunks], answer: How do I configure..."
4. Claude answers with grounded, accurate information from YOUR docs

See 17 - RAG (Retrieval-Augmented Generation) for the complete pipeline.

Use Cases Beyond RAG

Use Case	How
Semantic search	Search by meaning, not just keywords
Recommendation	"Users who liked X" → find similar embeddings
Clustering	Group similar documents/tickets/feedback
Deduplication	Find near-duplicate content
Classification	Compare against labeled examples
Anomaly detection	Find vectors far from normal clusters

Practical Example: Building a Simple Semantic Search

python
from openai import OpenAI
import numpy as np

client = OpenAI()

# Your documents
docs = [
    "Python is a versatile programming language",
    "JavaScript runs in the browser",
    "Machine learning uses statistical models",
    "The weather in Paris is mild",
]

# Embed all documents
doc_embeddings = []
for doc in docs:
    response = client.embeddings.create(
        input=doc,
        model="text-embedding-3-small"
    )
    doc_embeddings.append(response.data[0].embedding)

# User query
query = "What language is good for web development?"
query_resp = client.embeddings.create(
    input=query,
    model="text-embedding-3-small"
)
query_embedding = query_resp.data[0].embedding

# Find most similar (cosine similarity)
similarities = [
    np.dot(query_embedding, doc_emb)
    / (np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb))
    for doc_emb in doc_embeddings
]

# Result: "JavaScript runs in the browser" scores highest ✅

Resources

🔗 OpenAI — Embeddings Guide
🎥 3Blue1Brown — Word Embeddings
🔗 Pinecone — What Are Embeddings
🔗 MTEB Leaderboard — Compare Embedding Models
🔗 pgvector — PostgreSQL Extension

Previous: 04 - Temperature, Top-P & Sampling | Next: 06 - Fine-Tuning vs Prompting vs RAG