Advanced Prompting Techniques

Beyond the Basics

You know zero-shot, few-shot, and chain-of-thought from 07 - Prompt Engineering Fundamentals. These advanced techniques handle harder problems: multi-step reasoning, reliability at scale, and structured automation.

1. ReAct (Reason + Act)

What: The model alternates between reasoning about the problem and taking actions (tool calls, searches). It forms a loop: Think → Act → Observe → Think → Act → ...

Question: "What's the population of the capital of the country that
           won the 2022 FIFA World Cup?"

Thought 1: I need to find who won the 2022 World Cup.
Action 1:  search("2022 FIFA World Cup winner")
Observation 1: Argentina won the 2022 FIFA World Cup.

Thought 2: The capital of Argentina is Buenos Aires. I need its population.
Action 2:  search("Buenos Aires population 2024")
Observation 2: Buenos Aires metro area population is ~15.5 million.

Thought 3: I have the answer.
Answer: Approximately 15.5 million people.

Why it matters: This is how AI agents work. Claude Code, ChatGPT with tools, and most agentic systems use ReAct-style loops internally.

When to use: Multi-step tasks requiring external data, tool use, or iterative problem-solving.

2. Tree-of-Thought (ToT)

What: Instead of one reasoning path, explore multiple paths simultaneously and pick the best one. Like a chess player considering several moves ahead.

Problem: "How should we migrate our monolith to microservices?"

Path A: Strangler Fig Pattern
  → Low risk, gradual
  → Takes 12-18 months
  → Score: 8/10

Path B: Big Bang Rewrite
  → High risk, clean slate
  → Takes 6-9 months
  → Score: 4/10

Path C: Domain-Driven Decomposition
  → Medium risk, strategic
  → Takes 9-12 months
  → Score: 7/10

Best path: A (Strangler Fig) — lowest risk, proven approach.

How to trigger it in a prompt:

"Consider 3 different approaches to solve this problem.
 For each approach:
 1. Describe the strategy
 2. List pros and cons
 3. Rate feasibility (1-10)
 Then pick the best approach and explain why."

When to use: Complex decisions with trade-offs, architecture choices, debugging when the root cause is unclear.

3. Self-Consistency (Majority Voting)

What: Generate the same answer multiple times (with temperature > 0), then pick the most common answer. Reduces random errors.

Question: "Is this code thread-safe?"

Run 1: "No — the shared counter has no lock"     → NO
Run 2: "No — race condition on line 12"           → NO
Run 3: "Yes — the GIL protects it"                → YES
Run 4: "No — multiple threads can increment"      → NO
Run 5: "No — needs a mutex or atomic operation"   → NO

Majority vote: NO (4/5) → Final answer: Not thread-safe.

Implementation pattern (API):

python
import collections

answers = []
for _ in range(5):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        temperature=0.7,
        messages=[{"role": "user", "content": prompt}]
    )
    answers.append(extract_answer(response))

final = collections.Counter(answers).most_common(1)[0][0]

When to use: High-stakes classification, math problems, any task where you need confidence in the answer. Trade-off: costs N times more tokens.

4. Structured Output

What: Force the model to respond in a specific machine-readable format (JSON, XML, YAML).

In the prompt:

"Analyze this error log and respond in this exact JSON format:
{
  "error_type": "string",
  "root_cause": "string",
  "severity": "low | medium | high | critical",
  "suggested_fix": "string",
  "affected_files": ["string"]
}"

Via API (JSON mode):

python
# OpenAI
response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[...]
)

# Anthropic (tool use for structured output)
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    tools=[{
        "name": "analyze_error",
        "input_schema": {
            "type": "object",
            "properties": {
                "error_type": {"type": "string"},
                "severity": {"enum": ["low", "medium", "high", "critical"]},
            },
            "required": ["error_type", "severity"]
        }
    }],
    tool_choice={"type": "tool", "name": "analyze_error"},
    messages=[...]
)

When to use: Any time your code needs to parse the model's output. Always prefer structured output over regex-parsing free text.

5. Constitutional AI (Self-Critique)

What: The model generates an answer, then critiques it against a set of principles, then revises.

Step 1 — Generate:
  "Here's a function to hash passwords: md5(password)"

Step 2 — Critique:
  "This uses MD5 which is cryptographically broken.
   It doesn't use salt. It's vulnerable to rainbow tables."

Step 3 — Revise:
  "Use bcrypt with a work factor of 12:
   bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))"

How to trigger it:

"Answer the question below. Then critique your own answer for:
 - Factual accuracy
 - Security implications
 - Missing edge cases
 Finally, provide a revised answer addressing the critique."

When to use: Security-sensitive code, medical/legal content, any domain where errors are costly.

6. Meta-Prompting

What: Use AI to write or improve your prompts.

Prompt to the model:
  "I need a prompt that will make an LLM consistently classify
   customer support emails into these categories: billing, technical,
   account, feedback, spam.

   Write me an optimized prompt with:
   - A clear role
   - 5 few-shot examples
   - Explicit output format
   - Edge case handling"

Output: (a well-structured prompt you can use directly)

Practical workflow:

Write your first attempt at a prompt
Ask the model: "How can this prompt be improved? What ambiguities exist?"
Apply the suggestions
Test on edge cases
Iterate

When to use: When building production prompts, when your prompt gives inconsistent results, when you're stuck on phrasing.

7. Prompt Chaining

What: Break a complex task into sequential steps. The output of one prompt feeds into the next.

Task: "Generate a technical blog post about WebSockets"

Chain:
┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│ Step 1: Outline  │ ──→ │ Step 2: Draft    │ ──→ │ Step 3: Review   │
│ Generate 5-point │     │ Write each       │     │ Check accuracy,  │
│ outline          │     │ section from     │     │ improve examples │
│                  │     │ the outline      │     │                  │
└──────────────────┘     └──────────────────┘     └──────────────────┘

Real engineering example:

Step 1: "Analyze this error log and identify the root cause."
         → Output: "Database connection pool exhausted due to leaked connections"

Step 2: "Given this root cause: [Step 1 output]. Find the code responsible.
          Here's the codebase: [relevant files]"
         → Output: "db.py line 45: connection acquired but never released in error path"

Step 3: "Fix this bug: [Step 2 output]. Write the corrected code with
          proper connection cleanup using context managers."
         → Output: (fixed code)

Why chaining beats one mega-prompt:

Each step is simpler and more reliable
You can inspect and verify intermediate results
You can retry a single step without re-running everything
Total quality is higher than one complex prompt

8. Multi-Turn Refinement

What: Iteratively improve output through conversation turns.

Turn 1: "Write a Python function to parse CSV files."
  → (basic function)

Turn 2: "Good, but add error handling for malformed rows
          and support for custom delimiters."
  → (improved function)

Turn 3: "Now add type hints, a docstring, and make it
          a generator for memory efficiency."
  → (production-ready function)

Strategy tips:

Start broad, then narrow down
Each turn should address 1-2 specific improvements
Reference what was good: "Keep the error handling, but also..."
If the model goes off track, give it the correct version and say "continue from here"

Decision Table: When to Use Which Technique

Technique	Best For	Complexity	Token Cost
Zero-shot	Simple tasks	Low	Low
Few-shot	Custom formats, classification	Low	Medium
Chain-of-thought	Math, logic, reasoning	Low	Medium
ReAct	Multi-step tasks needing tools	High	High
Tree-of-thought	Complex decisions with trade-offs	Medium	High
Self-consistency	High-stakes, need confidence	Low	Very High (Nx)
Structured output	Machine-readable responses	Low	Low
Self-critique	Safety-critical, accuracy-critical	Medium	Medium
Meta-prompting	Building production prompts	Low	Medium
Prompt chaining	Complex multi-step workflows	Medium	Medium-High
Multi-turn	Iterative refinement	Low	Medium

Combining Techniques

Real-world prompts often combine several techniques:

Role prompting + Few-shot + Chain-of-thought + Structured output:

"You are a senior security engineer. (ROLE)

Analyze code for vulnerabilities. For each finding,
think through the attack vector step by step. (CoT)

Example:                                        (FEW-SHOT)
Code: query = f"SELECT * FROM users WHERE id = {input}"
Analysis: Step 1: User controls 'input' variable.
          Step 2: Input is interpolated directly into SQL.
          Step 3: Attacker can inject: 1 OR 1=1
Finding: {"vuln": "SQL Injection", "severity": "critical",
          "line": 1, "fix": "Use parameterized query"}

Now analyze:                                    (STRUCTURED)
[your code here]

Respond as a JSON array of findings."

Practical Tips

Start simple. Zero-shot first. Add complexity only when needed.
Test with adversarial inputs. Not just happy paths.
Version control your prompts. They're code. Treat them as such.
Measure results. Run the same prompt 10 times. How consistent is it?
Budget your tokens. Chain-of-thought and self-consistency cost more. Worth it for hard tasks, wasteful for easy ones.
Document what works. Build a prompt library for your team.

Resources

📄 ReAct: Synergizing Reasoning and Acting (Yao et al., 2022)
📄 Tree of Thoughts (Yao et al., 2023)
📄 Self-Consistency (Wang et al., 2022)
📄 Constitutional AI (Bai et al., 2022)
🔗 Anthropic — Tool Use
🔗 OpenAI — Structured Outputs

Previous: 07 - Prompt Engineering Fundamentals | Next: 09 - System Prompts & Instructions