Embeddings & Vector Search

What Are Embeddings?

An embedding converts text into a list of numbers (a vector) that captures its meaning.

"I love cats"  → [0.23, -0.45, 0.89, 0.12, ..., -0.34]  (1536 numbers)
"I adore cats" → [0.24, -0.44, 0.88, 0.13, ..., -0.33]  (very similar!)
"Buy stocks"   → [0.91, 0.02, -0.56, 0.77, ..., 0.45]   (very different)

Key insight: Similar meaning = similar vectors. Unrelated meaning = distant vectors.

Analogy: GPS coordinates for meaning. "Paris" and "Lyon" have close coordinates (both in France). "Paris" and "Tokyo" have distant coordinates. Embeddings do the same but for meaning in 1000+ dimensions.


The Classic Example

king - man + woman ≈ queen

Vector math:
  [0.5, 0.8, 0.3] - [0.4, 0.1, 0.5] + [0.6, 0.2, 0.7] = [0.7, 0.9, 0.5]
  Closest vector to [0.7, 0.9, 0.5] in vocabulary → "queen"

The model learned that the relationship between "king" and "man" is similar to the relationship between "queen" and "woman" — without anyone teaching it this explicitly.


How Similarity Is Measured

Cosine Similarity

Most common metric. Measures the angle between two vectors (ignores magnitude).

similarity = (A · B) / (|A| × |B|)

Result range:
  1.0  = identical meaning
  0.0  = unrelated
  -1.0 = opposite meaning

Example:

sim("I love cats", "I adore kittens")  = 0.95  (very similar)
sim("I love cats", "The weather is nice") = 0.15  (unrelated)
sim("I love cats", "I hate cats")      = 0.65  (related topic, different sentiment)

Other Distance Metrics

MetricHow It WorksUsed For
Cosine similarityAngle between vectorsMost text similarity tasks
Euclidean distanceStraight-line distanceWhen magnitude matters
Dot productRaw dot productOptimized retrieval

Embedding Models

ModelDimensionsProviderNotes
text-embedding-3-small1536OpenAIGood quality, cheap
text-embedding-3-large3072OpenAIBest quality from OpenAI
embed-v41024CohereGood multilingual
BGE-large-en1024Open Source (BAAI)Best open-source
E5-mistral-7b4096Open SourceInstruction-following embeddings
Voyage-31024Voyage AIStrong for code

Pricing: Very cheap compared to LLMs. OpenAI text-embedding-3-small: $0.02 per 1M tokens.


Vector Databases

Regular databases search by exact match or range. Vector databases search by similarity.

Traditional DB:
  SELECT * FROM products WHERE category = 'shoes' AND price < 100

Vector DB:
  "Find me products similar to 'comfortable running shoes for rainy weather'"
  → Returns products with vectors closest to the query vector

Popular Vector Databases

DatabaseTypeBest For
PineconeManaged cloudProduction, zero-ops
WeaviateOpen source + cloudHybrid search (vector + keyword)
QdrantOpen source + cloudHigh performance, filtering
ChromaOpen sourcePrototyping, simple local use
pgvectorPostgreSQL extensionAlready using PostgreSQL
MilvusOpen sourceLarge-scale (billions of vectors)
FAISSLibrary (Meta)In-memory, fastest for research

How Vector Search Works (ANN)

Searching ALL vectors for the closest match is O(n) — too slow for millions of vectors.

Approximate Nearest Neighbor (ANN) algorithms:

HNSW (Hierarchical Navigable Small World):

Build a multi-layer graph:
  Top layer:    few nodes, long-range connections (navigate quickly)
  Middle layers: more nodes, medium connections
  Bottom layer:  all nodes, short-range connections (precise search)

Search: Start at top → hop through graph → drill down → find nearest
  • ~95-99% recall (finds true nearest neighbor 95%+ of the time)
  • Sub-millisecond search over millions of vectors
  • Used by: Qdrant, Weaviate, pgvector

IVF (Inverted File Index):

  • Cluster vectors into groups
  • At query time, only search the closest clusters
  • Faster but lower recall than HNSW

The RAG Connection

Embeddings + vector search = the foundation of RAG (Retrieval-Augmented Generation).

Setup (one-time):
  Your documents → Split into chunks → Embed each chunk → Store in vector DB

At query time:
  User question → Embed question → Search vector DB → Get top 5 similar chunks
  → Send chunks + question to LLM → LLM answers using the chunks
User: "How do I configure the payment webhook?"

1. Embed question → [0.12, 0.89, ...]
2. Vector search → finds chunks from payment-docs.md and webhook-guide.md
3. Send to Claude: "Using this context: [chunks], answer: How do I configure..."
4. Claude answers with grounded, accurate information from YOUR docs

See 17 - RAG (Retrieval-Augmented Generation) for the complete pipeline.


Use Cases Beyond RAG

Use CaseHow
Semantic searchSearch by meaning, not just keywords
Recommendation"Users who liked X" → find similar embeddings
ClusteringGroup similar documents/tickets/feedback
DeduplicationFind near-duplicate content
ClassificationCompare against labeled examples
Anomaly detectionFind vectors far from normal clusters

Practical Example: Building a Simple Semantic Search

python
from openai import OpenAI import numpy as np client = OpenAI() # Your documents docs = [ "Python is a versatile programming language", "JavaScript runs in the browser", "Machine learning uses statistical models", "The weather in Paris is mild", ] # Embed all documents doc_embeddings = [] for doc in docs: response = client.embeddings.create( input=doc, model="text-embedding-3-small" ) doc_embeddings.append(response.data[0].embedding) # User query query = "What language is good for web development?" query_resp = client.embeddings.create( input=query, model="text-embedding-3-small" ) query_embedding = query_resp.data[0].embedding # Find most similar (cosine similarity) similarities = [ np.dot(query_embedding, doc_emb) / (np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb)) for doc_emb in doc_embeddings ] # Result: "JavaScript runs in the browser" scores highest ✅

Resources


Previous: 04 - Temperature, Top-P & Sampling | Next: 06 - Fine-Tuning vs Prompting vs RAG