Fine-Tuning vs Prompting vs RAG

Three Ways to Customize AI

You have a base model (Claude, GPT-4). It's general-purpose. You need it to work for YOUR specific use case. Three approaches:

                    Effort & Cost
                    ─────────────→
  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
  │  Prompting  │  │     RAG     │  │ Fine-Tuning │
  │  (easiest)  │  │  (medium)   │  │  (hardest)  │
  └─────────────┘  └─────────────┘  └─────────────┘
  Change what you    Give it your     Retrain the
  SAY to the model   documents at     model itself
                     query time

Rule: Always start with prompting. Add RAG if needed. Fine-tune only as last resort.

1. Prompting

What: Change the instructions you send to the model.

System Prompt:
  "You are a senior Python developer at Acme Corp.
   Follow PEP 8. Use type hints. Prefer functional patterns.
   Our codebase uses FastAPI + SQLAlchemy + PostgreSQL.
   Always include error handling and logging."

Techniques

Technique	Example	Best For
Zero-shot	"Translate this to French"	Simple tasks
Few-shot	"Here are 3 examples... now do the 4th"	Formatting, style matching
Chain-of-thought	"Think step by step"	Reasoning, math
System prompt	Persistent instructions	Consistent behavior
Persona	"You are a security expert"	Domain-specific responses

Pros

Free (no training cost)
Instant (no training time)
Flexible (change prompt anytime)
Works with any model
No data preparation needed

Cons

Limited by context window
Can't teach truly new knowledge
Prompt can get long and expensive
Behavior may drift across conversations
Can't change the model's fundamental capabilities

When to Use

Always start here. Most tasks are solvable with good prompting.
Style/format/tone control
Task instructions
Few-shot examples for specific output patterns

2. RAG (Retrieval-Augmented Generation)

What: Before answering, retrieve relevant documents and include them in the prompt.

User: "What's our refund policy for enterprise customers?"

1. Search your knowledge base → finds refund-policy.md
2. Send to model: "Context: [refund-policy.md content]. Question: What's our..."
3. Model answers based on YOUR documents, not its training data

Architecture

┌──────────────────────── Setup (one-time) ──────────────────────┐
│ Your docs → Chunk (split into pieces) → Embed → Vector DB      │
└────────────────────────────────────────────────────────────────┘

┌──────────────────────── At query time ──────────────────────────┐
│ Question → Embed → Search vector DB → Top K chunks              │
│ → [System prompt + chunks + question] → LLM → Answer            │
└─────────────────────────────────────────────────────────────────┘

Pros

Model answers from YOUR data (reduces hallucination)
Data can be updated without retraining
Works with any model (no training needed)
Scales to large knowledge bases
Cheaper than fine-tuning
Audit trail (you know WHICH documents informed the answer)

Cons

Retrieval quality matters a lot (garbage in = garbage out)
Chunking strategy is tricky
Adds latency (search + LLM call)
More infrastructure to maintain (vector DB, embedding pipeline)
Can't change the model's style or behavior

When to Use

Customer support bots (answer from help docs)
Internal knowledge search (company wiki, Confluence)
Code documentation Q&A
Legal/compliance document search
Any time the model needs your specific data to answer correctly

See 17 - RAG (Retrieval-Augmented Generation) for full pipeline details.

3. Fine-Tuning

What: Further train the model on your own dataset to change its behavior permanently.

Training data (JSONL):
{"messages": [
  {"role": "system", "content": "You are a customer service agent for Acme."},
  {"role": "user", "content": "I want to cancel my subscription"},
  {"role": "assistant", "content": "I'd be happy to help you with cancellation. Can I have your account email?"}
]}
... (hundreds to thousands of examples)

What Fine-Tuning Changes

Style and tone — match your brand voice consistently
Format — always output in specific JSON structure
Domain jargon — understand your industry terminology
Behavior patterns — consistent decision-making process
Efficiency — shorter prompts needed (behavior is "baked in")

What Fine-Tuning Does NOT Do Well

Adding new factual knowledge — use RAG instead
Occasional tasks — prompting is cheaper and faster
Rapidly changing information — retraining is slow
General capability improvement — you can't make GPT-4 mini as smart as GPT-4

Pros

Consistent behavior without long prompts
Can capture subtle patterns from examples
Lower inference cost (shorter prompts needed)
Better at specific formats/styles

Cons

Requires curated training data (hundreds-thousands of examples)
Costs money to train ($0.008/1K tokens for GPT-4o fine-tuning)
Takes time (hours to days)
Risk of catastrophic forgetting (model gets worse at things outside training data)
Must retrain when base model updates
Harder to debug and iterate

When to Use

Need very consistent style/format across millions of calls
Prompting alone can't achieve desired behavior
Have high-quality training data
High volume (fine-tuned smaller model can replace expensive larger model)

Decision Framework

Start here:
│
├─ Can prompting solve it?
│  YES → Use prompting. Done. ✅
│  NO ↓
│
├─ Does the model need access to your specific data/documents?
│  YES → Add RAG. ✅
│  NO ↓
│
├─ Does the model need to consistently behave in a specific way
│  that prompting can't achieve?
│  YES → Fine-tune. ✅
│  NO → Revisit your prompting strategy.

Comparison Table

Aspect	Prompting	RAG	Fine-Tuning
Setup time	Minutes	Hours-Days	Days-Weeks
Training data needed	None	Documents (any format)	Curated examples (100s-1000s)
Cost to set up	$0	$$ (vector DB, embeddings)	$$$ (GPU training)
Cost per query	Higher (long prompts)	Medium (search + LLM)	Lower (shorter prompts)
Update data	Edit prompt instantly	Re-embed new documents	Retrain model
Best for	Instructions, format, style	Knowledge, facts, docs	Behavior, tone, format
Hallucination risk	Higher	Lower (grounded in docs)	Medium
Flexibility	Very flexible	Flexible	Rigid (baked in)

Real-World Examples

Scenario	Best Approach	Why
Chatbot for your SaaS help docs	RAG	Needs YOUR specific documentation
Code review tool	Prompting + few-shot	Instructions + examples sufficient
Medical report summarizer	Fine-tune + RAG	Specific format + medical knowledge base
Customer email responder	Fine-tune	Need consistent brand voice at high volume
SQL query generator for YOUR schema	RAG (schema as context)	Schema is the knowledge, prompting handles the task
Content moderation	Fine-tune	Need consistent classifications at scale
Internal company search	RAG	Company data changes frequently

The Hybrid Approach (Best Practice)

Most production systems combine all three:

System Prompt (prompting):
  "You are a helpful assistant for Acme Corp. Be professional and concise."

RAG Pipeline:
  Retrieve relevant documents from knowledge base

Fine-Tuned Model:
  Trained on Acme's preferred response style and format

Combined: Fine-tuned model + RAG context + system prompt = best results

Resources

🔗 Anthropic — Prompt Engineering Guide
🔗 OpenAI — Fine-Tuning Guide
📖 "Building LLM-Powered Applications" by Valentina Alto
🎥 Andrej Karpathy — State of GPT (training pipeline)
🔗 LangChain — RAG Tutorial

Previous: 05 - Embeddings & Vector Search | Next: 07 - Prompt Engineering Fundamentals