Advanced Prompting Techniques

Beyond the Basics

You know zero-shot, few-shot, and chain-of-thought from 07 - Prompt Engineering Fundamentals. These advanced techniques handle harder problems: multi-step reasoning, reliability at scale, and structured automation.


1. ReAct (Reason + Act)

What: The model alternates between reasoning about the problem and taking actions (tool calls, searches). It forms a loop: Think → Act → Observe → Think → Act → ...

Question: "What's the population of the capital of the country that
           won the 2022 FIFA World Cup?"

Thought 1: I need to find who won the 2022 World Cup.
Action 1:  search("2022 FIFA World Cup winner")
Observation 1: Argentina won the 2022 FIFA World Cup.

Thought 2: The capital of Argentina is Buenos Aires. I need its population.
Action 2:  search("Buenos Aires population 2024")
Observation 2: Buenos Aires metro area population is ~15.5 million.

Thought 3: I have the answer.
Answer: Approximately 15.5 million people.

Why it matters: This is how AI agents work. Claude Code, ChatGPT with tools, and most agentic systems use ReAct-style loops internally.

When to use: Multi-step tasks requiring external data, tool use, or iterative problem-solving.


2. Tree-of-Thought (ToT)

What: Instead of one reasoning path, explore multiple paths simultaneously and pick the best one. Like a chess player considering several moves ahead.

Problem: "How should we migrate our monolith to microservices?"

Path A: Strangler Fig Pattern
  → Low risk, gradual
  → Takes 12-18 months
  → Score: 8/10

Path B: Big Bang Rewrite
  → High risk, clean slate
  → Takes 6-9 months
  → Score: 4/10

Path C: Domain-Driven Decomposition
  → Medium risk, strategic
  → Takes 9-12 months
  → Score: 7/10

Best path: A (Strangler Fig) — lowest risk, proven approach.

How to trigger it in a prompt:

"Consider 3 different approaches to solve this problem.
 For each approach:
 1. Describe the strategy
 2. List pros and cons
 3. Rate feasibility (1-10)
 Then pick the best approach and explain why."

When to use: Complex decisions with trade-offs, architecture choices, debugging when the root cause is unclear.


3. Self-Consistency (Majority Voting)

What: Generate the same answer multiple times (with temperature > 0), then pick the most common answer. Reduces random errors.

Question: "Is this code thread-safe?"

Run 1: "No — the shared counter has no lock"     → NO
Run 2: "No — race condition on line 12"           → NO
Run 3: "Yes — the GIL protects it"                → YES
Run 4: "No — multiple threads can increment"      → NO
Run 5: "No — needs a mutex or atomic operation"   → NO

Majority vote: NO (4/5) → Final answer: Not thread-safe.

Implementation pattern (API):

python
import collections answers = [] for _ in range(5): response = client.messages.create( model="claude-sonnet-4-20250514", temperature=0.7, messages=[{"role": "user", "content": prompt}] ) answers.append(extract_answer(response)) final = collections.Counter(answers).most_common(1)[0][0]

When to use: High-stakes classification, math problems, any task where you need confidence in the answer. Trade-off: costs N times more tokens.


4. Structured Output

What: Force the model to respond in a specific machine-readable format (JSON, XML, YAML).

In the prompt:

"Analyze this error log and respond in this exact JSON format:
{
  "error_type": "string",
  "root_cause": "string",
  "severity": "low | medium | high | critical",
  "suggested_fix": "string",
  "affected_files": ["string"]
}"

Via API (JSON mode):

python
# OpenAI response = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_object"}, messages=[...] ) # Anthropic (tool use for structured output) response = client.messages.create( model="claude-sonnet-4-20250514", tools=[{ "name": "analyze_error", "input_schema": { "type": "object", "properties": { "error_type": {"type": "string"}, "severity": {"enum": ["low", "medium", "high", "critical"]}, }, "required": ["error_type", "severity"] } }], tool_choice={"type": "tool", "name": "analyze_error"}, messages=[...] )

When to use: Any time your code needs to parse the model's output. Always prefer structured output over regex-parsing free text.


5. Constitutional AI (Self-Critique)

What: The model generates an answer, then critiques it against a set of principles, then revises.

Step 1 — Generate:
  "Here's a function to hash passwords: md5(password)"

Step 2 — Critique:
  "This uses MD5 which is cryptographically broken.
   It doesn't use salt. It's vulnerable to rainbow tables."

Step 3 — Revise:
  "Use bcrypt with a work factor of 12:
   bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))"

How to trigger it:

"Answer the question below. Then critique your own answer for:
 - Factual accuracy
 - Security implications
 - Missing edge cases
 Finally, provide a revised answer addressing the critique."

When to use: Security-sensitive code, medical/legal content, any domain where errors are costly.


6. Meta-Prompting

What: Use AI to write or improve your prompts.

Prompt to the model:
  "I need a prompt that will make an LLM consistently classify
   customer support emails into these categories: billing, technical,
   account, feedback, spam.

   Write me an optimized prompt with:
   - A clear role
   - 5 few-shot examples
   - Explicit output format
   - Edge case handling"

Output: (a well-structured prompt you can use directly)

Practical workflow:

  1. Write your first attempt at a prompt
  2. Ask the model: "How can this prompt be improved? What ambiguities exist?"
  3. Apply the suggestions
  4. Test on edge cases
  5. Iterate

When to use: When building production prompts, when your prompt gives inconsistent results, when you're stuck on phrasing.


7. Prompt Chaining

What: Break a complex task into sequential steps. The output of one prompt feeds into the next.

Task: "Generate a technical blog post about WebSockets"

Chain:
┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│ Step 1: Outline  │ ──→ │ Step 2: Draft    │ ──→ │ Step 3: Review   │
│ Generate 5-point │     │ Write each       │     │ Check accuracy,  │
│ outline          │     │ section from     │     │ improve examples │
│                  │     │ the outline      │     │                  │
└──────────────────┘     └──────────────────┘     └──────────────────┘

Real engineering example:

Step 1: "Analyze this error log and identify the root cause."
         → Output: "Database connection pool exhausted due to leaked connections"

Step 2: "Given this root cause: [Step 1 output]. Find the code responsible.
          Here's the codebase: [relevant files]"
         → Output: "db.py line 45: connection acquired but never released in error path"

Step 3: "Fix this bug: [Step 2 output]. Write the corrected code with
          proper connection cleanup using context managers."
         → Output: (fixed code)

Why chaining beats one mega-prompt:

  • Each step is simpler and more reliable
  • You can inspect and verify intermediate results
  • You can retry a single step without re-running everything
  • Total quality is higher than one complex prompt

8. Multi-Turn Refinement

What: Iteratively improve output through conversation turns.

Turn 1: "Write a Python function to parse CSV files."
  → (basic function)

Turn 2: "Good, but add error handling for malformed rows
          and support for custom delimiters."
  → (improved function)

Turn 3: "Now add type hints, a docstring, and make it
          a generator for memory efficiency."
  → (production-ready function)

Strategy tips:

  • Start broad, then narrow down
  • Each turn should address 1-2 specific improvements
  • Reference what was good: "Keep the error handling, but also..."
  • If the model goes off track, give it the correct version and say "continue from here"

Decision Table: When to Use Which Technique

TechniqueBest ForComplexityToken Cost
Zero-shotSimple tasksLowLow
Few-shotCustom formats, classificationLowMedium
Chain-of-thoughtMath, logic, reasoningLowMedium
ReActMulti-step tasks needing toolsHighHigh
Tree-of-thoughtComplex decisions with trade-offsMediumHigh
Self-consistencyHigh-stakes, need confidenceLowVery High (Nx)
Structured outputMachine-readable responsesLowLow
Self-critiqueSafety-critical, accuracy-criticalMediumMedium
Meta-promptingBuilding production promptsLowMedium
Prompt chainingComplex multi-step workflowsMediumMedium-High
Multi-turnIterative refinementLowMedium

Combining Techniques

Real-world prompts often combine several techniques:

Role prompting + Few-shot + Chain-of-thought + Structured output:

"You are a senior security engineer. (ROLE)

Analyze code for vulnerabilities. For each finding,
think through the attack vector step by step. (CoT)

Example:                                        (FEW-SHOT)
Code: query = f"SELECT * FROM users WHERE id = {input}"
Analysis: Step 1: User controls 'input' variable.
          Step 2: Input is interpolated directly into SQL.
          Step 3: Attacker can inject: 1 OR 1=1
Finding: {"vuln": "SQL Injection", "severity": "critical",
          "line": 1, "fix": "Use parameterized query"}

Now analyze:                                    (STRUCTURED)
[your code here]

Respond as a JSON array of findings."

Practical Tips

  1. Start simple. Zero-shot first. Add complexity only when needed.
  2. Test with adversarial inputs. Not just happy paths.
  3. Version control your prompts. They're code. Treat them as such.
  4. Measure results. Run the same prompt 10 times. How consistent is it?
  5. Budget your tokens. Chain-of-thought and self-consistency cost more. Worth it for hard tasks, wasteful for easy ones.
  6. Document what works. Build a prompt library for your team.

Resources


Previous: 07 - Prompt Engineering Fundamentals | Next: 09 - System Prompts & Instructions