Fine-Tuning vs Prompting vs RAG

Three Ways to Customize AI

You have a base model (Claude, GPT-4). It's general-purpose. You need it to work for YOUR specific use case. Three approaches:

                    Effort & Cost
                    ─────────────→
  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
  │  Prompting  │  │     RAG     │  │ Fine-Tuning │
  │  (easiest)  │  │  (medium)   │  │  (hardest)  │
  └─────────────┘  └─────────────┘  └─────────────┘
  Change what you    Give it your     Retrain the
  SAY to the model   documents at     model itself
                     query time

Rule: Always start with prompting. Add RAG if needed. Fine-tune only as last resort.


1. Prompting

What: Change the instructions you send to the model.

System Prompt:
  "You are a senior Python developer at Acme Corp.
   Follow PEP 8. Use type hints. Prefer functional patterns.
   Our codebase uses FastAPI + SQLAlchemy + PostgreSQL.
   Always include error handling and logging."

Techniques

TechniqueExampleBest For
Zero-shot"Translate this to French"Simple tasks
Few-shot"Here are 3 examples... now do the 4th"Formatting, style matching
Chain-of-thought"Think step by step"Reasoning, math
System promptPersistent instructionsConsistent behavior
Persona"You are a security expert"Domain-specific responses

Pros

  • Free (no training cost)
  • Instant (no training time)
  • Flexible (change prompt anytime)
  • Works with any model
  • No data preparation needed

Cons

  • Limited by context window
  • Can't teach truly new knowledge
  • Prompt can get long and expensive
  • Behavior may drift across conversations
  • Can't change the model's fundamental capabilities

When to Use

  • Always start here. Most tasks are solvable with good prompting.
  • Style/format/tone control
  • Task instructions
  • Few-shot examples for specific output patterns

2. RAG (Retrieval-Augmented Generation)

What: Before answering, retrieve relevant documents and include them in the prompt.

User: "What's our refund policy for enterprise customers?"

1. Search your knowledge base → finds refund-policy.md
2. Send to model: "Context: [refund-policy.md content]. Question: What's our..."
3. Model answers based on YOUR documents, not its training data

Architecture

┌──────────────────────── Setup (one-time) ──────────────────────┐
│ Your docs → Chunk (split into pieces) → Embed → Vector DB      │
└────────────────────────────────────────────────────────────────┘

┌──────────────────────── At query time ──────────────────────────┐
│ Question → Embed → Search vector DB → Top K chunks              │
│ → [System prompt + chunks + question] → LLM → Answer            │
└─────────────────────────────────────────────────────────────────┘

Pros

  • Model answers from YOUR data (reduces hallucination)
  • Data can be updated without retraining
  • Works with any model (no training needed)
  • Scales to large knowledge bases
  • Cheaper than fine-tuning
  • Audit trail (you know WHICH documents informed the answer)

Cons

  • Retrieval quality matters a lot (garbage in = garbage out)
  • Chunking strategy is tricky
  • Adds latency (search + LLM call)
  • More infrastructure to maintain (vector DB, embedding pipeline)
  • Can't change the model's style or behavior

When to Use

  • Customer support bots (answer from help docs)
  • Internal knowledge search (company wiki, Confluence)
  • Code documentation Q&A
  • Legal/compliance document search
  • Any time the model needs your specific data to answer correctly

See 17 - RAG (Retrieval-Augmented Generation) for full pipeline details.


3. Fine-Tuning

What: Further train the model on your own dataset to change its behavior permanently.

Training data (JSONL):
{"messages": [
  {"role": "system", "content": "You are a customer service agent for Acme."},
  {"role": "user", "content": "I want to cancel my subscription"},
  {"role": "assistant", "content": "I'd be happy to help you with cancellation. Can I have your account email?"}
]}
... (hundreds to thousands of examples)

What Fine-Tuning Changes

  • Style and tone — match your brand voice consistently
  • Format — always output in specific JSON structure
  • Domain jargon — understand your industry terminology
  • Behavior patterns — consistent decision-making process
  • Efficiency — shorter prompts needed (behavior is "baked in")

What Fine-Tuning Does NOT Do Well

  • Adding new factual knowledge — use RAG instead
  • Occasional tasks — prompting is cheaper and faster
  • Rapidly changing information — retraining is slow
  • General capability improvement — you can't make GPT-4 mini as smart as GPT-4

Pros

  • Consistent behavior without long prompts
  • Can capture subtle patterns from examples
  • Lower inference cost (shorter prompts needed)
  • Better at specific formats/styles

Cons

  • Requires curated training data (hundreds-thousands of examples)
  • Costs money to train ($0.008/1K tokens for GPT-4o fine-tuning)
  • Takes time (hours to days)
  • Risk of catastrophic forgetting (model gets worse at things outside training data)
  • Must retrain when base model updates
  • Harder to debug and iterate

When to Use

  • Need very consistent style/format across millions of calls
  • Prompting alone can't achieve desired behavior
  • Have high-quality training data
  • High volume (fine-tuned smaller model can replace expensive larger model)

Decision Framework

Start here:
│
├─ Can prompting solve it?
│  YES → Use prompting. Done. ✅
│  NO ↓
│
├─ Does the model need access to your specific data/documents?
│  YES → Add RAG. ✅
│  NO ↓
│
├─ Does the model need to consistently behave in a specific way
│  that prompting can't achieve?
│  YES → Fine-tune. ✅
│  NO → Revisit your prompting strategy.

Comparison Table

AspectPromptingRAGFine-Tuning
Setup timeMinutesHours-DaysDays-Weeks
Training data neededNoneDocuments (any format)Curated examples (100s-1000s)
Cost to set up$0$$ (vector DB, embeddings)$$$ (GPU training)
Cost per queryHigher (long prompts)Medium (search + LLM)Lower (shorter prompts)
Update dataEdit prompt instantlyRe-embed new documentsRetrain model
Best forInstructions, format, styleKnowledge, facts, docsBehavior, tone, format
Hallucination riskHigherLower (grounded in docs)Medium
FlexibilityVery flexibleFlexibleRigid (baked in)

Real-World Examples

ScenarioBest ApproachWhy
Chatbot for your SaaS help docsRAGNeeds YOUR specific documentation
Code review toolPrompting + few-shotInstructions + examples sufficient
Medical report summarizerFine-tune + RAGSpecific format + medical knowledge base
Customer email responderFine-tuneNeed consistent brand voice at high volume
SQL query generator for YOUR schemaRAG (schema as context)Schema is the knowledge, prompting handles the task
Content moderationFine-tuneNeed consistent classifications at scale
Internal company searchRAGCompany data changes frequently

The Hybrid Approach (Best Practice)

Most production systems combine all three:

System Prompt (prompting):
  "You are a helpful assistant for Acme Corp. Be professional and concise."

RAG Pipeline:
  Retrieve relevant documents from knowledge base

Fine-Tuned Model:
  Trained on Acme's preferred response style and format

Combined: Fine-tuned model + RAG context + system prompt = best results

Resources


Previous: 05 - Embeddings & Vector Search | Next: 07 - Prompt Engineering Fundamentals