Fine-Tuning vs Prompting vs RAG
Three Ways to Customize AI
You have a base model (Claude, GPT-4). It's general-purpose. You need it to work for YOUR specific use case. Three approaches:
Effort & Cost
─────────────→
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Prompting │ │ RAG │ │ Fine-Tuning │
│ (easiest) │ │ (medium) │ │ (hardest) │
└─────────────┘ └─────────────┘ └─────────────┘
Change what you Give it your Retrain the
SAY to the model documents at model itself
query time
Rule: Always start with prompting. Add RAG if needed. Fine-tune only as last resort.
1. Prompting
What: Change the instructions you send to the model.
System Prompt:
"You are a senior Python developer at Acme Corp.
Follow PEP 8. Use type hints. Prefer functional patterns.
Our codebase uses FastAPI + SQLAlchemy + PostgreSQL.
Always include error handling and logging."
Techniques
| Technique | Example | Best For |
|---|---|---|
| Zero-shot | "Translate this to French" | Simple tasks |
| Few-shot | "Here are 3 examples... now do the 4th" | Formatting, style matching |
| Chain-of-thought | "Think step by step" | Reasoning, math |
| System prompt | Persistent instructions | Consistent behavior |
| Persona | "You are a security expert" | Domain-specific responses |
Pros
- Free (no training cost)
- Instant (no training time)
- Flexible (change prompt anytime)
- Works with any model
- No data preparation needed
Cons
- Limited by context window
- Can't teach truly new knowledge
- Prompt can get long and expensive
- Behavior may drift across conversations
- Can't change the model's fundamental capabilities
When to Use
- Always start here. Most tasks are solvable with good prompting.
- Style/format/tone control
- Task instructions
- Few-shot examples for specific output patterns
2. RAG (Retrieval-Augmented Generation)
What: Before answering, retrieve relevant documents and include them in the prompt.
User: "What's our refund policy for enterprise customers?"
1. Search your knowledge base → finds refund-policy.md
2. Send to model: "Context: [refund-policy.md content]. Question: What's our..."
3. Model answers based on YOUR documents, not its training data
Architecture
┌──────────────────────── Setup (one-time) ──────────────────────┐
│ Your docs → Chunk (split into pieces) → Embed → Vector DB │
└────────────────────────────────────────────────────────────────┘
┌──────────────────────── At query time ──────────────────────────┐
│ Question → Embed → Search vector DB → Top K chunks │
│ → [System prompt + chunks + question] → LLM → Answer │
└─────────────────────────────────────────────────────────────────┘
Pros
- Model answers from YOUR data (reduces hallucination)
- Data can be updated without retraining
- Works with any model (no training needed)
- Scales to large knowledge bases
- Cheaper than fine-tuning
- Audit trail (you know WHICH documents informed the answer)
Cons
- Retrieval quality matters a lot (garbage in = garbage out)
- Chunking strategy is tricky
- Adds latency (search + LLM call)
- More infrastructure to maintain (vector DB, embedding pipeline)
- Can't change the model's style or behavior
When to Use
- Customer support bots (answer from help docs)
- Internal knowledge search (company wiki, Confluence)
- Code documentation Q&A
- Legal/compliance document search
- Any time the model needs your specific data to answer correctly
See 17 - RAG (Retrieval-Augmented Generation) for full pipeline details.
3. Fine-Tuning
What: Further train the model on your own dataset to change its behavior permanently.
Training data (JSONL):
{"messages": [
{"role": "system", "content": "You are a customer service agent for Acme."},
{"role": "user", "content": "I want to cancel my subscription"},
{"role": "assistant", "content": "I'd be happy to help you with cancellation. Can I have your account email?"}
]}
... (hundreds to thousands of examples)
What Fine-Tuning Changes
- Style and tone — match your brand voice consistently
- Format — always output in specific JSON structure
- Domain jargon — understand your industry terminology
- Behavior patterns — consistent decision-making process
- Efficiency — shorter prompts needed (behavior is "baked in")
What Fine-Tuning Does NOT Do Well
- Adding new factual knowledge — use RAG instead
- Occasional tasks — prompting is cheaper and faster
- Rapidly changing information — retraining is slow
- General capability improvement — you can't make GPT-4 mini as smart as GPT-4
Pros
- Consistent behavior without long prompts
- Can capture subtle patterns from examples
- Lower inference cost (shorter prompts needed)
- Better at specific formats/styles
Cons
- Requires curated training data (hundreds-thousands of examples)
- Costs money to train ($0.008/1K tokens for GPT-4o fine-tuning)
- Takes time (hours to days)
- Risk of catastrophic forgetting (model gets worse at things outside training data)
- Must retrain when base model updates
- Harder to debug and iterate
When to Use
- Need very consistent style/format across millions of calls
- Prompting alone can't achieve desired behavior
- Have high-quality training data
- High volume (fine-tuned smaller model can replace expensive larger model)
Decision Framework
Start here:
│
├─ Can prompting solve it?
│ YES → Use prompting. Done. ✅
│ NO ↓
│
├─ Does the model need access to your specific data/documents?
│ YES → Add RAG. ✅
│ NO ↓
│
├─ Does the model need to consistently behave in a specific way
│ that prompting can't achieve?
│ YES → Fine-tune. ✅
│ NO → Revisit your prompting strategy.
Comparison Table
| Aspect | Prompting | RAG | Fine-Tuning |
|---|---|---|---|
| Setup time | Minutes | Hours-Days | Days-Weeks |
| Training data needed | None | Documents (any format) | Curated examples (100s-1000s) |
| Cost to set up | $0 | $$ (vector DB, embeddings) | $$$ (GPU training) |
| Cost per query | Higher (long prompts) | Medium (search + LLM) | Lower (shorter prompts) |
| Update data | Edit prompt instantly | Re-embed new documents | Retrain model |
| Best for | Instructions, format, style | Knowledge, facts, docs | Behavior, tone, format |
| Hallucination risk | Higher | Lower (grounded in docs) | Medium |
| Flexibility | Very flexible | Flexible | Rigid (baked in) |
Real-World Examples
| Scenario | Best Approach | Why |
|---|---|---|
| Chatbot for your SaaS help docs | RAG | Needs YOUR specific documentation |
| Code review tool | Prompting + few-shot | Instructions + examples sufficient |
| Medical report summarizer | Fine-tune + RAG | Specific format + medical knowledge base |
| Customer email responder | Fine-tune | Need consistent brand voice at high volume |
| SQL query generator for YOUR schema | RAG (schema as context) | Schema is the knowledge, prompting handles the task |
| Content moderation | Fine-tune | Need consistent classifications at scale |
| Internal company search | RAG | Company data changes frequently |
The Hybrid Approach (Best Practice)
Most production systems combine all three:
System Prompt (prompting):
"You are a helpful assistant for Acme Corp. Be professional and concise."
RAG Pipeline:
Retrieve relevant documents from knowledge base
Fine-Tuned Model:
Trained on Acme's preferred response style and format
Combined: Fine-tuned model + RAG context + system prompt = best results
Resources
- 🔗 Anthropic — Prompt Engineering Guide
- 🔗 OpenAI — Fine-Tuning Guide
- 📖 "Building LLM-Powered Applications" by Valentina Alto
- 🎥 Andrej Karpathy — State of GPT (training pipeline)
- 🔗 LangChain — RAG Tutorial
Previous: 05 - Embeddings & Vector Search | Next: 07 - Prompt Engineering Fundamentals